是否有可能投花车直接__m128,如果他们是16字节的神韵? [英] Is it possible to cast floats directly to __m128 if they are 16 byte alligned?
问题描述
它是安全/可能/最好直接投花车以 __ M128
如果他们是16字节对齐?
Is it safe/possible/advisable to cast floats directly to __m128
if they are 16 byte aligned?
我注意到使用 _mm_load_ps
和 _mm_store_ps
来包装原始阵列增加了一个显著的开销。
I noticed using _mm_load_ps
and _mm_store_ps
to "wrap" a raw array adds a significant overhead.
什么是潜在的陷阱我应该知道的?
What are potential pitfalls I should be aware of?
编辑:
目前实际上是在使用加载和存储指令没有开销,我得到了混合一些数字,这就是为什么我得到了更好的性能。即便是你我能够做一些可怕的mangling与原始内存地址的 __ M128
例如,当我跑测试花了两倍的时间来完成,而不 _mm_load_ps
指令,可能回落到一些故障安全code路径。
There is actually no overhead in using the load and store instructions, I got some numbers mixed and that is why I got better performance. Even thou I was able to do some HORRENDOUS mangling with raw memory addresses in a __m128
instance, when I ran the test it took TWICE AS LONG to complete without the _mm_load_ps
instruction, probably falling back to some fail safe code path.
推荐答案
是什么让你认为 _mm_load_ps
和 _mm_store_ps
添加显著的开销?这是正常的方式来加载/存储浮点数据从/ SSE寄存器假设源/目的就是内存(以及任何其他方法,最终归结为是这样)。
What makes you think that _mm_load_ps
and _mm_store_ps
"add a significant overhead" ? This is the normal way to load/store float data to/from SSE registers assuming source/destination is memory (and any other method eventually boils down to this anyway).
这篇关于是否有可能投花车直接__m128,如果他们是16字节的神韵?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!