为什么不直接访问__m128i字段? [英] Why should you not access the __m128i fields directly?

查看:116
本文介绍了为什么不直接访问__m128i字段?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在MSDN上阅读此内容,并且上面写着

I was reading this on MSDN, and it says

您不应直接访问__m128i字段.但是,您可以在调试器中查看这些类型. __m128i类型的变量映射到XMM [0-7]寄存器.

You should not access the __m128i fields directly. You can, however, see these types in the debugger. A variable of type __m128i maps to the XMM[0-7] registers.

但是,它没有解释原因.为什么?例如,以下是坏":

However, it doesn't explain why. Why is it? For example, is the following "bad":

void func(unsigned short x, unsigned short y)
{
    __m128i a;
    a.m128i_i64[0] = x;

    __m128i b;
    b.m128i_i64[0] = y;

    // Now do something with a and b ...
}

不是像上面的示例那样进行分配,而是应该使用某种load函数?

Instead of doing the assignments like in the example above, should one use some sort of load function?

推荐答案

字段m128i_i64和family是Microsoft编译器特定的扩展.它们在大多数其他编译器中都不存在.

The field m128i_i64 and family are Microsoft compiler specific extensions. They don't exist in most other compilers.

尽管如此,它们仍可用于测试.

Nevertheless, they are useful for testing purposes.

避免使用它们的真正原因是性能.硬件无法有效访问SIMD向量的各个元素.

The real reason for avoiding their use is performance. The hardware cannot efficiently access individual elements of a SIMD vector.

  • 没有任何说明可让您直接访问各个元素. (SSE4.1确实需要,但是它需要一个编译时常量索引.)
  • 由于商店转发失败,浏览内存可能会遭受非常大的损失.
  • There are no instructions that let you directly access individual elements. (SSE4.1 does, but it requires a compile-time constant index.)
  • Going through memory may incur a very large penalty due to failure of store forwarding.

AVX和AVX2没有扩展SSE4.1指令以允许访问256位向量中的元素.据我所知,AVX512在512位向量中将没有它.

AVX and AVX2 doesn't extend the SSE4.1 instructions to allow accessing elements in a 256-bit vector. And as far as I can tell, AVX512 will not have it for 512-bit vectors.

同样,集合内在函数(例如_mm256_set_pd())也遇到相同的问题.它们可以作为一系列数据改组操作来实现.或通过查看内存并承担商店转发摊位.

Likewise, the set intrinsics (such as _mm256_set_pd()) suffer the same issue. They are implemented either as a series of data shuffling operations. Or by going through memory and taking on the store forwarding stalls.

哪个提出了问题:是否存在一种从标量分量填充SIMD向量的有效方法? (或将SIMD向量分离为标量分量)

简短的回答:并非如此.使用SIMD时,应该以矢量化形式完成很多工作.因此,初始化开销无关紧要.

Short Answer: Not really. When you use SIMD, you're expected to do a lot of the work in the vectorized form. So the initialization overhead should not matter.

这篇关于为什么不直接访问__m128i字段?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆