使用 SSE2 和 AVX2 编译库 [英] Compiling library with SSE2 and AVX2

查看:54
本文介绍了使用 SSE2 和 AVX2 编译库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用 VS2015 并编译具有 SSE2 指令和 AVX2 指令(仅在 CPU 中检测到时使用)的库,如果我使用 /arch:AVX2 编译库但仅调用SSE2 指令我得到非法指令"(在 _mm_set1_epi32 上调用的第一个 SSE2 指令).但是,如果我使用 /arch:SSE2 编译 lib,它在调用 SSE2 指令时工作正常.

Using VS2015 and compiling a library that has both SSE2 instructions and AVX2 instructions (that are only used if detected in the CPU), if I compile the library with /arch:AVX2 but only call the SSE2 instructions I get "illegal instruction" (on _mm_set1_epi32 first SSE2 instruction called). However, if I compile the lib with /arch:SSE2 it works fine when calling the SSE2 instructions.

拱门设置是否相互排斥?如果不是,这应该如何解决?我已经尝试将共享库和静态库用作相同的问题.

Are the arch settings mutually exclusive? If not how should this be fixed? I have attempted both as a shared lib and static lib with the same issue.

这是库:https://github.com/Auburns/FastNoiseSIMD 并且有关于它的一个问题 https://github.com/Auburns/FastNoiseSIMD/issues/20,虽然我不认为它直接与 AVX2 开启和调用 SSE2 指令相关.

this is the lib: https://github.com/Auburns/FastNoiseSIMD and there is an issue about it https://github.com/Auburns/FastNoiseSIMD/issues/20, although I don't think the related it directly to AVX2 being on and calling SSE2 instructions.

推荐答案

如果您使用 /arch:AVX/arch:AVX2 构建,主要 影响是编译器生成的所有 SSE 代码都将使用 VEX 前缀编码这允许更有效地调度寄存器.如果你在没有 AVX 或 AVX2 支持的系统上运行这样的代码,它实际上会因非法指令而出错.

If you build with /arch:AVX or /arch:AVX2, the primary impact is that all SSE code generated by the compiler will use the VEX prefix encoding which allows for more efficient scheduling of registers. If you run such code on a system without AVX or AVX2 support, it will in fact fault with an illegal instruction.

换句话说,您使用的 _mm_set1_epi32 是一条 SSE2 指令,但是因为您使用 /arch:AVX2 构建,它使用 VEX 前缀发出这些指令./arch 开关会影响显式内在函数、编译器生成的浮点数学、自动向量化器等.

In other words, your use of _mm_set1_epi32 is an SSE2 instruction, but because you built with /arch:AVX2 it emitted those instructions using the VEX prefix. The /arch switch impacts explicit intrinsics, compiler-generated floating-point math, the autovectorizer, etc.

如果您想使用 /arch 开关支持的自动生成来支持具有优化代码路径的库存"SSE/SSE2、AVX 和 AVX2 平台,您需要三个不同的二进制文件(EXE 或DLL).

If you want to support both 'stock' SSE/SSE2, AVX, and AVX2 platforms with optimized codepaths using the automatic generation supported by the /arch switch, you need three different binaries (EXEs or DLLs).

另见这篇博文作为这个

请注意 /arch:AVX/arch:AVX2 之间的主要区别在于编译器有时会发出 FMA3 指令,而调度程序认为它会比乘以加.

Note the main difference between /arch:AVX and /arch:AVX2 is that the compiler will sometimes emit FMA3 instructions where the scheduler thinks it would be faster than a multiply then an add.

这篇关于使用 SSE2 和 AVX2 编译库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆