使用SSE将常量乘以常数 [英] Multiplying vector by constant using SSE

查看:206
本文介绍了使用SSE将常量乘以常数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些在4D矢量上运行的代码,我目前正试图将它转换为使用SSE。我在64b linux上同时使用了clang和gcc。

仅在矢量上运行都很好。但现在来了一个部分,我必须乘以一个单一的常量整个向量 - 这样的事情:

I have some code that operates on 4D vectors and I'm currently trying to convert it to use SSE. I'm using both clang and gcc on 64b linux.
Operating only on vectors is all fine -grasped that. But now comes a part where i have to multiply an entire vector by a single constant - Something like this:

float y[4];
float a1 =   25.0/216.0;  

for(j=0; j<4; j++){  
    y[j] = a1 * x[j];  
} 

类似于此:

float4 y;
float a1 =   25.0/216.0;  

y = a1 * x;  

其中:

where:

typedef double v4sf __attribute__ ((vector_size(4*sizeof(float)))); 

typedef union float4{
    v4sf v;
    float x,y,z,w;
} float4;

这当然不起作用,因为我试图对不兼容的数据类型进行乘法运算。
现在,我可以执行如下操作:

float4 a1 =(v4sf){25.0 / 216.0,25.0 / 216.0,25.0 / 216.0,25.0 / 216.0}
但只是让我觉得很傻,即使我写了一个宏来做这件事。
另外,我敢肯定,这不会产生非常高效的代码。

This of course will not work because I'm trying to do a multiplication of incompatiple data types.
Now, i could do something like:
float4 a1 = (v4sf){25.0/216.0, 25.0/216.0, 25.0/216.0, 25.0/216.0} but just makes me feel silly, even if if i write a macro to do this. Also, I'm pretty certain that will not result in very efficient code.

谷歌搜索没有带来明确的答案(参见

Googling this brought no clear answers ( see Load constant floats into SSE registers).

那么用相同的常量乘整个向量的最好方法是什么?

So what is the best way to multiply an entire vector by the same constant?

推荐答案

只需使用内在函数并让编译器保持小心例如

Just use intrinsics and let the compiler take care of it, e.g.

__m128 vb = _mm_set_ps(1.0f, 2.0f, 3.0f, 4.0f); // vb = { 1.0, 2.0, 3.0, 4.0 }
__m128 va = _mm_set1_ps(25.0f / 216.0f); // va = { 25.0f / 216.0f, 25.0f / 216.0f, 25.0f / 216.0f, 25.0f / 216.0f }
__m128 vc = _mm_mul_ps(va, vb); // vc = va * vb

如果您查看生成的代码,它应该非常高效 - 25.0f / 16.0f 值将在编译时计算出来, _mm_set1_ps 生成通常会产生合理高效的代码来溅出一个向量。

If you look at the generated code it should be quite efficient - the 25.0f / 16.0f value will be calculated at compile time and _mm_set1_ps generates usually generates reasonably efficient code for splatting a vector.

还要注意,在进入一个循环之前,您通常只会初始化一个常量向量,例如 va 在那里你会做大部分的实际工作,所以它往往不是性能关键。

Note also that you normally only initialise a constant vector such as va just once, prior to entering a loop where you will be doing most of the actual work, so it tends not to be performance-critical.

这篇关于使用SSE将常量乘以常数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆