SSE(SIMD):由标量乘以向量 [英] SSE (SIMD): multiply vector by scalar

查看:638
本文介绍了SSE(SIMD):由标量乘以向量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

一个常用的操作我在我的程序做的是一个标量缩放向量(V * S,例如[1,2,3,4] * 2 == [2,4,6,8])。有一个SSE(或AVX)指令要做到这一点,除了第一加载在每一个位置上的标量在一个载体(例如_mm_set_ps(2,2,2,2)),然后乘以

这就是我现在做的:

  __ M128 _scalar = _mm_set_ps(S,S,S,S);
__m128 _result = _mm_mul_ps(_Vector,_scalar);

我正在寻找类似...

  __ M128 _result = _mm_scale_ps(_Vector,S);


解决方案

根据您的编译器,你也许可以有点用,提高了code代 _mm_set1_ps

 常量__m128标= _mm_set1_ps(S);
__m128结果= _mm_mul_ps(矢量,标量);

这样然而标量常量应该只需要进行一次初始化,任何循环外,这样的表现费用应该是无关的。 (除非标值在循环中不断变化的?)

往常一样,你应该看看code编译器生成,并尝试运行一个体面的探查下,您的code,看看那里的热点真的是。

A common operation I do in my program is scaling vectors by a scalar (V*s, e.g. [1,2,3,4]*2 == [2,4,6,8]). Is there a SSE (or AVX) instruction to do this, other than first loading the scalar in every position in a vector (e.g. _mm_set_ps(2,2,2,2)) and then multiplying?

This is what I do now:

__m128 _scalar = _mm_set_ps(s,s,s,s);
__m128 _result = _mm_mul_ps(_vector, _scalar);

I'm looking for something like...

__m128 _result = _mm_scale_ps(_vector, s);

解决方案

Depending on your compiler you may be able to improve the code generation a little by using _mm_set1_ps:

const __m128 scalar = _mm_set1_ps(s);
__m128 result = _mm_mul_ps(vector, scalar);

However scalar constants like this should only need to be initialised once, outside any loops, so the performance cost should be irrelevant. (Unless the scalar value is changing within the loop ?)

As always you should look at the code your compiler generates and also try running your code under a decent profiler to see where the hotspots really are.

这篇关于SSE(SIMD):由标量乘以向量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆