_mm_broadcast_ss比_mm_set1_ps快吗? [英] Is _mm_broadcast_ss faster than _mm_set1_ps?

查看:83
本文介绍了_mm_broadcast_ss比_mm_set1_ps快吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是代码

float a = ...;
__m256 b = _mm_broadcast_ss(&a)

总是比这段代码快

float a = ...;
_mm_set1_ps(a)

?

如果a定义为static const float a = ...而不是float a = ...怎么办?

What if a defined as static const float a = ... rather than float a = ...?

推荐答案

mm_broadcast_ss可能比mm_set1_ps快.前者转换为一条指令(VBROADCASTSS),而后者则使用多条指令进行模拟(可能是MOVSS,然后是随机播放).但是,mm_broadcast_ss需要AVX指令集,而mm_set1_ps仅需要SSE.

mm_broadcast_ss is likely to be faster than mm_set1_ps. The former translates into a single instruction (VBROADCASTSS), while the latter is emulated using multiple instructions (probably a MOVSS followed by a shuffle). However, mm_broadcast_ss requires the AVX instruction set, while only SSE is required for mm_set1_ps.

这篇关于_mm_broadcast_ss比_mm_set1_ps快吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆