SSE:_mm_load/store 与使用直接指针访问的区别 [英] SSE: Difference between _mm_load/store vs. using direct pointer access

查看:27
本文介绍了SSE:_mm_load/store 与使用直接指针访问的区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我要添加两个缓冲区并存储结果.两个缓冲区都已分配为 16 字节对齐.我找到了两个如何做到这一点的例子.

Suppose I want to add two buffers and store the result. Both buffers are already allocated 16byte aligned. I found two examples how to do that.

第一个是使用 _mm_load 将数据从缓冲区读取到 SSE 寄存器中,执行加法操作并存储回结果寄存器.直到现在我都会这样做.

The first one is using _mm_load to read the data from the buffer into an SSE register, does the add operation and stores back to the result register. Until now I would have done it like that.

void _add( uint16_t * dst, uint16_t const * src, size_t n )
{
  for( uint16_t const * end( dst + n ); dst != end; dst+=8, src+=8 )
  {
    __m128i _s = _mm_load_si128( (__m128i*) src );
    __m128i _d = _mm_load_si128( (__m128i*) dst );

    _d = _mm_add_epi16( _d, _s );

    _mm_store_si128( (__m128i*) dst, _d );
  }
}

第二个例子只是直接对内存地址进行加法操作,没有进行加载/存储操作.两个接缝都能正常工作.

The second example just did the add operations directly on the memory addresses without load/store operation. Both seam to work fine.

void _add( uint16_t * dst, uint16_t const * src, size_t n )
{
  for( uint16_t const * end( dst + n ); dst != end; dst+=8, src+=8 )
  {
    *(__m128i*) dst = _mm_add_epi16( *(__m128i*) dst, *(__m128i*) src );
  }
}

所以问题是第二个例子是否正确或可能有任何副作用,何时使用加载/存储是强制性的.

So the question is if the 2nd example is correct or may have any side effects and when to use load/store is mandatory.

谢谢.

推荐答案

两个版本都很好 - 如果您查看生成的代码,您会发现第二个版本仍然至少生成一个向量寄存器的负载,因为 PADDW(又名 _mm_add_epi16)只能直接从内存中获取第二个参数.

Both versions are fine - if you look at the generated code you will see that the second version still generates at least one load to a vector register, since PADDW (aka _mm_add_epi16) can only get its second argument directly from memory.

实际上,大多数非平凡的 SIMD 代码会在加载和存储数据之间执行更多操作,而不仅仅是单个添加,因此通常您可能希望使用 _mm_load_XXX<将数据最初加载到向量变量(寄存器)中/code>,对寄存器执行所有 SIMD 操作,然后通过 _mm_store_XXX 将结果存储回内存.

In practice most non-trivial SIMD code will do a lot more operations between loading and storing data than just a single add, so in general you probably want to load data initially to vector variables (registers) using _mm_load_XXX, perform all your SIMD operations on registers, then store the results back to memory via _mm_store_XXX.

这篇关于SSE:_mm_load/store 与使用直接指针访问的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆