__m256类型的int内在函数问题 [英] Issue with __m256 type of intel intrinsics

查看:200
本文介绍了__m256类型的int内在函数问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试测试一些Intel Intrinsics,以了解它们如何工作.因此,我创建了一个为我执行此操作的函数,这是代码:

I'm trying to test some of the Intel Intrinsics to see how they work. So, i created a function to do that for me and this is the code:

void test_intel_256()
{
__m256 res,vec1,vec2;

__M256_MM_SET_PS(vec1, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0);
__M256_MM_SET_PS(vec1, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0);

__M256_MM_ADD_PS(res,vec1,vec2);

if (res[0] ==9 && res[1] ==9 && res[2] ==9 && res[3] ==9 
  && res[4] ==9 && res[5] ==9 && res[6] ==9 && res[7] ==9 )
    printf("Addition : OK!\n");
else
    printf("Addition : FAILED!\n");
}

但是随后出现这些错误:

But then i'm getting these errors:

error: unknown type name ‘__m256’
error: subscripted value is neither array nor pointer nor vector
error: subscripted value is neither array nor pointer nor vector 
error: subscripted value is neither array nor pointer nor vector
error: subscripted value is neither array nor pointer nor vector
error: subscripted value is neither array nor pointer nor vector
error: subscripted value is neither array nor pointer nor vector
error: subscripted value is neither array nor pointer nor vector
error: subscripted value is neither array nor pointer nor vector
error: subscripted value is neither array nor pointer nor vector

意味着编译器无法识别__m256类型,因此,他无法将res视为浮点数组. 我包括这些库 mmintrin.h emmintrin.h xmmintrin.h 而我正在使用月食火星

Meaning that the compiler is not recognizing the __m256 type and by consequence he can't see the res as an array of floats. I'm including these libraries mmintrin.h, emmintrin.h, xmmintrin.h and i'm using eclipse Mars

所以我想知道的是问题是来自编译器,硬件还是其他? 而我该如何解决呢? 谢谢!

So what i want to know is whether the problem is from the compiler or the hardware or something else? and how can i solve it? Thank you!

推荐答案

MMX和SSE2是x86-64的基线,但AVX不是.您要做需要专门启用AVX,而对于SSE2则不是.

MMX and SSE2 are baseline for x86-64, but AVX is not. You do need to specifically enable AVX, where you didn't for SSE2.

使用-march=haswell或实际使用的任何CPU进行构建.或者只是使用-mavx.

Build with -march=haswell or whatever CPU you actually have. Or just use -mavx.

请注意,带有默认值tune=genericgcc -mavx会将256b loadu/storeu内部函数拆分为vmovups xmm/vinsertf128,如果您的数据在大多数时间实际上是对齐的,则这是很糟糕的,尤其是在Haswell与随机端口吞吐量有限.

Beware that gcc -mavx with the default tune=generic will split 256b loadu/storeu intrinsics into vmovups xmm / vinsertf128, which is bad if your data is actually aligned most of the time, and especially bad on Haswell with limited shuffle-port throughput.

但是,如果您的数据确实不对齐,则对Sandybridge和Bulldozer家族来说是一件好事.参见 https://gcc.gnu.org/bugzilla/show_bug.cgi?id= 80568 :它甚至会影响AVX2矢量整数代码,即使所有AVX2 此调整会损害CPU(可能除了Excavator和Ryzen). tune=generic不考虑启用了哪些指令集扩展,也没有tune=generic-avx2.

It's good for Sandybridge and Bulldozer-family if your data really is unaligned, though. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80568: it even affects AVX2 vector-integer code, even though all AVX2 CPUs (except maybe Excavator and Ryzen) are harmed by this tuning. tune=generic doesn't take into account what instruction-set extension are enabled, and there's no tune=generic-avx2.

您可以使用-mavx2 -mno-avx256-split-unaligned-load -mno-avx256-split-unaligned-store.这仍然无法启用所有现代x86 CPU所具有的其他调整选项(例如,对比较和分支的宏融合进行优化)(低功耗处理器除外),但是gcc的tune = generic并未启用该选项. ( https://gcc.gnu.org/bugzilla/show_bug.cgi?id= 78855 ).

You could use -mavx2 -mno-avx256-split-unaligned-load -mno-avx256-split-unaligned-store. That still doesn't enable other tuning options (like optimizing for macro-fusion of compare and branch) that all modern x86 CPUs have (except low-power ones), but that isn't enabled by gcc's tune=generic. (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78855).

也:

我包括这些库mmintrin.h,emmintrin.h,xmmintrin.h

I'm including these libraries mmintrin.h, emmintrin.h, xmmintrin.h

不要那样做. 始终在SIMD代码中仅包含immintrin.h .它引入了所有英特尔SSE/AVX扩展.这就是为什么您得到error: unknown type name ‘__m256’

请记住,下标向量类型为__m256是非标准且不可移植的.它们不是数组,因此没有理由应该期望[]像数组一样工作.从寄存器的SIMD向量中提取第3个元素或其他内容需要洗牌指令,而不是加载指令.

Keep in mind that subscripting vector types lie __m256 is non-standard and non-portable. They're not arrays, and there's no reason you should expect [] to work like an array. Extracting the 3rd element or something from a SIMD vector in a register requires a shuffle instruction, not a load.

如果您希望使用方便的矢量类型包装器,使您可以像使用operator[]那样从矢量变量的元素中提取标量,请查看Agner Fog的

If you want handy wrappers for vector types that let you do stuff like use operator[] to extract scalars from elements of vector variables, have a look at Agner Fog's Vector Class Library. It's GPLed, so you'll have to look at other wrapper libraries if that's a problem.

它可以让您做类似的事情

It lets you do stuff like

// example from the manual for operator[]
Vec4i a(10,11,12,13);
int b = a[2];   // b = 12

您可以在VCL类型上使用普通内在函数. Vec8f__m256上的透明包装,因此可以将其与_mm256_mul_ps一起使用.

You can use normal intrinsics on VCL types. Vec8f is a transparent wrapper on __m256, so you can use it with _mm256_mul_ps.

这篇关于__m256类型的int内在函数问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆