_mm_broadcast_ss比_mm_set1_ps快吗? [英] Is _mm_broadcast_ss faster than _mm_set1_ps?
本文介绍了_mm_broadcast_ss比_mm_set1_ps快吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
这是代码
float a = ...;
__m256 b = _mm_broadcast_ss(&a)
总是比这段代码快
float a = ...;
_mm_set1_ps(a)
?
如果a
定义为static const float a = ...
而不是float a = ...
怎么办?
What if a
defined as static const float a = ...
rather than float a = ...
?
推荐答案
mm_broadcast_ss可能比mm_set1_ps快.前者转换为一条指令(VBROADCASTSS),而后者则使用多条指令进行模拟(可能是MOVSS,然后是随机播放).但是,mm_broadcast_ss需要AVX指令集,而mm_set1_ps仅需要SSE.
mm_broadcast_ss is likely to be faster than mm_set1_ps. The former translates into a single instruction (VBROADCASTSS), while the latter is emulated using multiple instructions (probably a MOVSS followed by a shuffle). However, mm_broadcast_ss requires the AVX instruction set, while only SSE is required for mm_set1_ps.
这篇关于_mm_broadcast_ss比_mm_set1_ps快吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文