Visual Studio 2017:_mm_load_ps通常编译为movups [英] Visual Studio 2017: _mm_load_ps often compiled to movups

查看：212 发布时间：2020/9/12 22:42:09 c++ assembly sse intrinsics visual-studio-2017

本文介绍了Visual Studio 2017:_mm_load_ps通常编译为movups的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在查看为我的代码生成的程序集(使用Visual Studio 2017)，并注意到_mm_load_ps通常(总是?)被编译为movups.

I am looking at the generated assembly for my code (using Visual Studio 2017) and noticed that _mm_load_ps is often (always?) compiled to movups.

我正在使用_mm_load_ps的数据是这样定义的:

The data I'm using _mm_load_ps on is defined like this:

struct alignas(16) Vector {
    float v[4];
}

// often embedded in other structs like this
struct AABB {
    Vector min;
    Vector max;
    bool intersection(/* parameters */) const;
}

现在，当我使用此构造时，将发生以下情况:

Now when I'm using this construct, the following will happen:

// this code
__mm128 bb_min = _mm_load_ps(min.v);

// generates this
movups  xmm4, XMMWORD PTR [r8]

由于alignas(16)，我期望有移动.在这种情况下，我还需要其他说服编译器使用movaps吗?

I'm expecting movaps because of alignas(16). Do I need something else to convince the compiler to use movaps in this case?

我的问题不同于此问题，因为我没有遇到任何崩溃.该结构专门对齐，我也使用对齐分配.相反，我很好奇为什么编译器将_mm_load_ps(对齐内存的内在函数)切换到movups.如果我知道struct被分配到一个对齐的地址，并且我通过this *调用它，那么使用movap是安全的，对吧?

My question is different from this question because I'm not getting any crashes. The struct is specifically aligned and I'm also using aligned allocation. Rather, I'm curious why the compiler is switching _mm_load_ps (the intrinsic for aligned memory) to movups. If I know struct was allocated at an aligned address and I'm calling it via this* it would be safe to use movaps, right?

推荐答案

在Visual Studio和Intel Compiler的最新版本(最近发布于2013年之后?)上，编译器很少再生成对齐的SIMD加载/存储了.

On recent versions of Visual Studio and the Intel Compiler (recent as post-2013?), the compiler rarely ever generates aligned SIMD load/stores anymore.

为AVX或更高版本编译时:

Microsoft编译器(> VS2013?)不会生成对齐的负载.但它仍会生成对齐的商店.
英特尔编译器(> Parallel Studio 2012?)不再使用它.但是您仍会在它们的手动优化的库(如memset())中的ICC编译的二进制文件中看到它们.
从GCC 6.1开始，当您使用对齐的内在函数时，它仍会生成对齐的加载/存储.

The Microsoft compiler (>VS2013?) doesn't generate aligned loads. But it still generates aligned stores.
The Intel compiler (> Parallel Studio 2012?) doesn't do it at all anymore. But you'll still see them in ICC-compiled binaries inside their hand-optimized libraries like memset().
As of GCC 6.1, it still generates aligned load/stores when you use the aligned intrinsics.

允许编译器执行此操作，因为正确编写代码不会失去功能.当地址对齐时，所有从Nehalem开始的处理器都不会因未对齐的加载/存储而受到惩罚.

The compiler is allowed to do this because it's not a loss of functionality when the code is written correctly. All processors starting from Nehalem have no penalty for unaligned load/stores when the address is aligned.

Microsoft在此问题上的立场是，它通过避免崩溃来帮助程序员".不幸的是，我再也找不到来自Microsoft的此声明的原始信息了.在我看来，这与之完全相反，因为它掩盖了未对准的惩罚.从正确性的角度来看，它还会隐藏不正确的代码.

Microsoft's stance on this issue is that it "helps the programmer by not crashing". Unfortunately, I can't find the original source for this statement from Microsoft anymore. In my opinion, this achieves the exact opposite of that because it hides misalignment penalties. From the correctness standpoint, it also hides incorrect code.

无论如何，无条件地使用未对齐的加载/存储确实会简化编译器.

Whatever the case is, unconditionally using unaligned load/stores does simplify the compiler a bit.

新关系:

从Parallel Studio 2018开始，英特尔®编译器完全不再生成对齐的动作-即使对于Nehalem之前的目标也是如此.
从Visual Studio 2017开始，Microsoft编译器也不再生成对齐的移动，即使以AVX之前的硬件为目标.

这两种情况都会导致较旧的处理器不可避免地降低性能.但似乎这是有意的，因为英特尔和微软都不再关心旧处理器.

Both cases result in inevitable performance degradation on older processors. But it seems that this is intentional as both Intel and Microsoft no longer care about old processors.

唯一不受此影响的加载/存储内在函数是非时间加载/存储.没有不对等的等同项，因此编译器别无选择.

The only load/store intrinsics that are immune to this are the non-temporal load/stores. There is no unaligned equivalent of them, so the compiler has no choice.

因此，如果您只想测试代码的正确性，则可以在加载/存储内部函数中替换非临时内部函数.但是要注意不要让类似的东西进入生产代码，因为NT加载/存储(尤其是NT存储)是一把双刃剑，如果您不知道自己在做什么，可能会伤害您.

So if you want to just test for correctness of your code, you can substitute in the load/store intrinsics for non-temporal ones. But be careful not to let something like this slip into production code since NT load/stores (NT-stores in particular) are a double-edged sword that can hurt you if you don't know what you're doing.

这篇关于Visual Studio 2017:_mm_load_ps通常编译为movups的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Visual Studio 2017:_mm_load_ps通常编译为movups [英] Visual Studio 2017: _mm_load_ps often compiled to movups

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

Visual Studio 2017:_mm_load_ps通常编译为movups [英] Visual Studio 2017: _mm_load_ps often compiled to movups

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭