SSE:_mm_load/store 与使用直接指针访问的区别 [英] SSE: Difference between _mm_load/store vs. using direct pointer access

查看：27 发布时间：2021/8/27 19:44:44 x86 sse simd

本文介绍了SSE:_mm_load/store 与使用直接指针访问的区别的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我要添加两个缓冲区并存储结果.两个缓冲区都已分配为 16 字节对齐.我找到了两个如何做到这一点的例子.

Suppose I want to add two buffers and store the result. Both buffers are already allocated 16byte aligned. I found two examples how to do that.

第一个是使用 _mm_load 将数据从缓冲区读取到 SSE 寄存器中，执行加法操作并存储回结果寄存器.直到现在我都会这样做.

The first one is using _mm_load to read the data from the buffer into an SSE register, does the add operation and stores back to the result register. Until now I would have done it like that.

void _add( uint16_t * dst, uint16_t const * src, size_t n )
{
  for( uint16_t const * end( dst + n ); dst != end; dst+=8, src+=8 )
  {
    __m128i _s = _mm_load_si128( (__m128i*) src );
    __m128i _d = _mm_load_si128( (__m128i*) dst );

    _d = _mm_add_epi16( _d, _s );

    _mm_store_si128( (__m128i*) dst, _d );
  }
}

第二个例子只是直接对内存地址进行加法操作，没有进行加载/存储操作.两个接缝都能正常工作.

The second example just did the add operations directly on the memory addresses without load/store operation. Both seam to work fine.

void _add( uint16_t * dst, uint16_t const * src, size_t n )
{
  for( uint16_t const * end( dst + n ); dst != end; dst+=8, src+=8 )
  {
    *(__m128i*) dst = _mm_add_epi16( *(__m128i*) dst, *(__m128i*) src );
  }
}

所以问题是第二个例子是否正确或可能有任何副作用，何时使用加载/存储是强制性的.

So the question is if the 2nd example is correct or may have any side effects and when to use load/store is mandatory.

谢谢.

推荐答案

两个版本都很好 - 如果您查看生成的代码，您会发现第二个版本仍然至少生成一个向量寄存器的负载，因为 PADDW(又名 _mm_add_epi16)只能直接从内存中获取第二个参数.

Both versions are fine - if you look at the generated code you will see that the second version still generates at least one load to a vector register, since PADDW (aka _mm_add_epi16) can only get its second argument directly from memory.

实际上，大多数非平凡的 SIMD 代码会在加载和存储数据之间执行更多操作，而不仅仅是单个添加，因此通常您可能希望使用 _mm_load_XXX<将数据最初加载到向量变量(寄存器)中/code>，对寄存器执行所有 SIMD 操作，然后通过 _mm_store_XXX 将结果存储回内存.


In practice most non-trivial SIMD code will do a lot more operations between loading and storing data than just a single add, so in general you probably want to load data initially to vector variables (registers) using _mm_load_XXX, perform all your SIMD operations on registers, then store the results back to memory via _mm_store_XXX.

                        这篇关于SSE:_mm_load/store 与使用直接指针访问的区别的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

SSE:_mm_load/store 与使用直接指针访问的区别 [英] SSE: Difference between _mm_load/store vs. using direct pointer access

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

SSE:_mm_load/store 与使用直接指针访问的区别 [英] SSE: Difference between _mm_load/store vs. using direct pointer access

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭