通过不断的使用SSE Mutiplying矢量 [英] Mutiplying vector by constant using SSE

查看:120
本文介绍了通过不断的使用SSE Mutiplying矢量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有四维向量操作一些code和我目前正试图将其转换为使用SSE。我使用这两个铛和gcc 64B上的Linux。结果
仅在矢量操作的所有细-grasped这一点。但现在来这里,我由一个常数乘以整个向量的一部分 - 事情是这样的:

I have some code that operates on 4D vectors and I'm currently trying to convert it to use SSE. I'm using both clang and gcc on 64b linux.
Operating only on vectors is all fine -grasped that. But now comes a part where i have to multiply an entire vector by a single constant - Something like this:

float y[4];
float a1 =   25.0/216.0;  

for(j=0; j<4; j++){  
    y[j] = a1 * x[j];  
} 

要这样的:

float4 y;
float a1 =   25.0/216.0;  

y = a1 * x;  

其中:

typedef double v4sf __attribute__ ((vector_size(4*sizeof(float)))); 

typedef union float4{
    v4sf v;
    float x,y,z,w;
} float4;

这当然是行不通的,因为我想做incompatiple数据类型的乘法运算。结果
现在,我可以这样做:结果
float4变量A1 =(v4sf){25.0 / 216.0 25.0 / 216.0 25.0 / 216.0,25.0 / 216.0}
只是让我觉得傻,即使如果我写宏来做到这一点。
此外,我pretty肯定的是,不会导致效率非常高code。

This of course will not work because I'm trying to do a multiplication of incompatiple data types.
Now, i could do something like:
float4 a1 = (v4sf){25.0/216.0, 25.0/216.0, 25.0/216.0, 25.0/216.0} but just makes me feel silly, even if if i write a macro to do this. Also, I'm pretty certain that will not result in very efficient code.

谷歌搜索这带来了不明确的答案(见不断加载到彩车SSE寄存器)。

Googling this brought no clear answers ( see Load constant floats into SSE registers).

那么,什么是由同一个常数乘以整个向量的最佳方式?

So what is the best way to multiply an entire vector by the same constant?

推荐答案

只需使用内联函数,让编译器照顾它,例如

Just use intrinsics and let the compiler take care of it, e.g.

__m128 vb = _mm_set_ps(1.0f, 2.0f, 3.0f, 4.0f); // vb = { 1.0, 2.0, 3.0, 4.0 }
__m128 va = _mm_set1_ps(25.0f / 216.0f); // va = { 25.0f / 216.0f, 25.0f / 216.0f, 25.0f / 216.0f, 25.0f / 216.0f }
__m128 vc = _mm_mul_ps(va, vb); // vc = va * vb

如果你看一下产生code那么它应该是相当高效的 - 25.0f / 16.0f 值将在编译时计算和 _mm_set1_ps 通常会产生会产生合理有效code代表泼洒的向量。

If you look at the generated code it should be quite efficient - the 25.0f / 16.0f value will be calculated at compile time and _mm_set1_ps generates usually generates reasonably efficient code for splatting a vector.

还请注意,你通常只初始化一个恒定的载体,如 VA 只有一次,进入了一个循环,你会做最实际工作之前,所以它趋向于没有性能关键

Note also that you normally only initialise a constant vector such as va just once, prior to entering a loop where you will be doing most of the actual work, so it tends not to be performance-critical.

这篇关于通过不断的使用SSE Mutiplying矢量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆