是否有可能投花车直接__m128,如果他们是16字节的神韵? [英] Is it possible to cast floats directly to __m128 if they are 16 byte alligned?

查看:167
本文介绍了是否有可能投花车直接__m128,如果他们是16字节的神韵?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

它是安全/可能/最好直接投花车以 __ M128 如果他们是16字节对齐?

Is it safe/possible/advisable to cast floats directly to __m128 if they are 16 byte aligned?

我注意到使用 _mm_load_ps _mm_store_ps 来包装原始阵列增加了一个显著的开销。

I noticed using _mm_load_ps and _mm_store_ps to "wrap" a raw array adds a significant overhead.

什么是潜在的陷阱我应该知道的?

What are potential pitfalls I should be aware of?

编辑:

目前实际上是在使用加载和存储指令没有开销,我得到了混合一些数字,这就是为什么我得到了更好的性能。即便是你我能够做一些可怕的mangling与原始内存地址的 __ M128 例如,当我跑测试花了两倍的时间来完成,而不 _mm_load_ps 指令,可能回落到一些故障安全code路径。

There is actually no overhead in using the load and store instructions, I got some numbers mixed and that is why I got better performance. Even thou I was able to do some HORRENDOUS mangling with raw memory addresses in a __m128 instance, when I ran the test it took TWICE AS LONG to complete without the _mm_load_ps instruction, probably falling back to some fail safe code path.

推荐答案

是什么让你认为 _mm_load_ps _mm_store_ps 添加显著的开销?这是正常的方式来加载/存储浮点数据从/ SSE寄存器假设源/目的就是内存(以及任何其他方法,最终归结为是这样)。

What makes you think that _mm_load_ps and _mm_store_ps "add a significant overhead" ? This is the normal way to load/store float data to/from SSE registers assuming source/destination is memory (and any other method eventually boils down to this anyway).

这篇关于是否有可能投花车直接__m128,如果他们是16字节的神韵?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆