如果浮点数是16字节对齐的,是否可以将浮点数直接转换为__m128? [英] Is it possible to cast floats directly to __m128 if they are 16 byte aligned?

查看:130
本文介绍了如果浮点数是16字节对齐的,是否可以将浮点数直接转换为__m128?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果浮点数是16字节对齐的,将浮点数直接转换为 __ m128 是否安全/可行/建议?

Is it safe/possible/advisable to cast floats directly to __m128 if they are 16 byte aligned?

我注意到使用 _mm_load_ps _mm_store_ps 来包装原始阵列会增加大量开销。

I noticed using _mm_load_ps and _mm_store_ps to "wrap" a raw array adds a significant overhead.

我应该注意哪些潜在的陷阱?

What are potential pitfalls I should be aware of?

编辑:

使用加载和存储指令实际上没有开销,我混合了一些数字,这就是为什么我可以获得更好的性能的原因。即使您能够在 __ m128 实例中使用原始内存地址进行一些令人讨厌的处理,当我运行测试时,它也需要花费TWICE AS LONG来完成,而没有 _mm_load_ps 指令,可能会退回到某些故障安全代码路径。

There is actually no overhead in using the load and store instructions, I got some numbers mixed and that is why I got better performance. Even thou I was able to do some HORRENDOUS mangling with raw memory addresses in a __m128 instance, when I ran the test it took TWICE AS LONG to complete without the _mm_load_ps instruction, probably falling back to some fail safe code path.

推荐答案

是什么让您认为 _mm_load_ps _mm_store_ps 会增加大量开销?假定源/目标是内存(这是任何其他方法最终都归结为这种方法),这是将浮动数据加载到SSE寄存器/从SSE寄存器存储浮动数据的正常方法。

What makes you think that _mm_load_ps and _mm_store_ps "add a significant overhead" ? This is the normal way to load/store float data to/from SSE registers assuming source/destination is memory (and any other method eventually boils down to this anyway).

这篇关于如果浮点数是16字节对齐的,是否可以将浮点数直接转换为__m128?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆