如何解决AVX加载/存储操作的32字节对齐问题？ [英] How to solve the 32-byte-alignment issue for AVX load/store operations?

查看：668 发布时间：2016/10/19 20:59:23 c++ c++11 vectorization sse avx

本文介绍了如何解决AVX加载/存储操作的32字节对齐问题？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在使用 ymm 寄存器时遇到了对齐问题，其中一些代码片段对我来说似乎很好。这里是一个最小的工作示例：

  #include< iostream& 
 #include< immintrin.h> 
 
 inline void ones（float * a）
 {
 __m256 out_aligned = _mm256_set1_ps（1.0f）; 
 _mm256_store_ps（a，out_aligned）; 
} 
 
 int main（）
 {
 size_t ss = 8; 
 float * a = new float [ss]; 
 ones（a）; 
 
 delete [] a; 
 
 std :: cout<< 都好！ << std :: endl; 
 return 0; 
}

当然， sizeof（float）是我的架构上的 4 （ Intel Xeon CPU E5-2650 v2 @ 2.60GHz ），我正在编译 gcc 使用 -O3 -march = native 标志。当然，错误消失与未对齐的内存访问，即指定 _mm256_storeu_ps 。我也没有这个问题在 xmm 寄存器，即

  inline void ones_sse（float * a）
 {
 __m128 out_aligned = _mm_set1_ps（1.0f）; 
 _mm_store_ps（a，out_aligned）; 
}

我做任何愚蠢的事吗？

解决方案

标准分配器可能只对齐到最宽的标准

strong> aligned_alloc ：ISO C11，并且在一些但不是所有的C ++编译器中可用。它不是任何ISO C ++标准的一部分，只有C11。（评论者报告它在MSVC ++中不可用，但请参阅最佳跨平台方法

posix_memalign ：部分的POSIX 2001，而不是任何ISO C或C ++标准。 Clunky原型/界面 aligned_alloc 。

  #include< stdlib.h> 
 int posix_memalign（void ** memptr，size_t alignment，size_t size）; // POSIX 2001 
 void * aligned_alloc（size_t alignment，size_t size）; // C11（not C ++）

_mm_malloc ：可在任何平台上使用 _mm_whatever_ps ，但不能将指针传递给 free 。在许多C和C ++实现上， _mm_free 和 free 是兼容的，但不能保证是可移植的。

在C ++ 11及更高版本中：使用<$ c（在运行时会失败， $ c> alignas（32）float avx_array [1234] 作为struct /类成员的第一个成员（或直接在平面数组上），因此该类型的静态和自动存储对象将具有32B对齐。 std :: aligned_storage 文档有这个技术的例子来解释 std :: aligned_storage 是什么。

对于动态分配的存储（如 std :: vector< my_class_with_aligned_member_array> ），请参阅使std :: vector分配对齐的内存。

b
$ b

最后，最后一个选项是那么糟糕，它甚至不是列表的一部分：分配一个更大的缓冲区，并添加do p + = 31; p& =〜31ULL 。由于在支持Intel _mm256 内联函数的每个平台上都可以使用对齐分配函数，因此太多的缺点（难以释放，浪费内存）值得讨论。

需要使用 _mm_free 而不是可能存在于 _mm_malloc 之上的一个简单的旧 malloc 使用此技术。

 
I am having alignment issue while using ymm registers, with some snippets of code that seems fine to me. Here is a minimal working example:
#include <iostream> 
#include <immintrin.h>

inline void ones(float *a)
{
     __m256 out_aligned = _mm256_set1_ps(1.0f);
     _mm256_store_ps(a,out_aligned);
}

int main()
{
     size_t ss = 8;
     float *a = new float[ss];
     ones(a);

     delete [] a;

     std::cout << "All Good!" << std::endl;
     return 0;
}
Certainly, sizeof(float) is 4 on my architecture (Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz) and I'm compiling with gcc using -O3 -march=native flags. Of course the error goes away with unaligned memory access i.e. specifying _mm256_storeu_ps. I also do not have this problem on xmm registers, i.e. 
inline void ones_sse(float *a)
{
     __m128 out_aligned = _mm_set1_ps(1.0f);
     _mm_store_ps(a,out_aligned);
}
Am I doing anything foolish? what is the work-around for this?
 解决方案 
The standard allocators are probably only aligning to 8B (the width of the widest standard type), or maybe 16B.

Options: 


aligned_alloc: ISO C11, and available in some but not all C++ compilers.  It's not part of any ISO C++ standard, only C11.  (commenters report it's unavailable in MSVC++, but see best cross-platform method to get aligned memory for a viable #ifdef for Windows).
posix_memalign: Part of POSIX 2001, not any ISO C or C++ standard.  Clunky prototype/interface compared to aligned_alloc.


#include <stdlib.h>
int posix_memalign(void **memptr, size_t alignment, size_t size);  // POSIX 2001
void *aligned_alloc(size_t alignment, size_t size);                // C11 (not C++)



_mm_malloc: Available on any platform where _mm_whatever_ps is available, but you can't pass pointers from it to free.  On many C and C++ implementations _mm_free and free are compatible, but it's not guaranteed to be portable.  (And unlike the other two, it will fail at run-time, not compile time.)
In C++11 and later: use alignas(32) float avx_array[1234] as the first member of a struct/class member (or on a plain array directly) so static and automatic storage objects of that type will have 32B alignment. std::aligned_storage documentation has an example of this technique to explain what std::aligned_storage does.

This doesn't actually work for dynamically-allocated storage (like a std::vector<my_class_with_aligned_member_array>), see Making std::vector allocate aligned memory.




And finally, the last option is so bad it's not even part of the list: allocate a larger buffer and add do p+=31; p&=~31ULL with appropriate casting.  Too many drawbacks (hard to free, wastes memory) to be worth discussing, since aligned-allocation functions are available on every platform that support Intel _mm256 intrinsics.  But there are even library functions that will help you do this, IIRC.

The requirement to use _mm_free instead of free probably exists to for the possibility of implementing _mm_malloc on top of a plain old malloc using this technique.

                        这篇关于如何解决AVX加载/存储操作的32字节对齐问题？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

如何解决AVX加载/存储操作的32字节对齐问题？ [英] How to solve the 32-byte-alignment issue for AVX load/store operations?

问题描述

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

如何解决AVX加载/存储操作的32字节对齐问题？ [英] How to solve the 32-byte-alignment issue for AVX load/store operations?

问题描述

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭