在AVX内部函数中使用xmm参数 [英] Using xmm parameter in AVX intrinsics

查看:106
本文介绍了在AVX内部函数中使用xmm参数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以将xmm寄存器参数与AVX内部函数(_mm256_**_**)一起使用?

Is it possible to use xmm register parameter with AVX intrinsics function (_mm256_**_**)?

我的代码要求使用vecter整数运算(用于加载和存储数据)以及矢量浮点运算.整数代码是使用SSE2内部函数编写的,以便与较旧的CPU兼容,而浮点数是使用AVX编写的,以提高速度(还有SSE代码分支,因此不建议这样做).

My code require the usage of vecter integer operation (for load and storing data) along with vector floating point operation. The integer code is written with SSE2 intrinsics to be compatible with older CPU, while floating point is written with AVX to improve speed (there is also SSE code branch, so do not suggest this).

当前,除了使用编译器标志自动将所有SSE指令转换为VEX编码版本外,还有什么方法可以使用内在函数(即不使用内联/外部汇编)来强制在XMM寄存器上使用VEX编码指令?

Currently, except for using compiler flag to automatically convert all SSE instructions to VEX-encoded version, are there any way using intrinsics function (i.e. no inline/external assembly) to force the use of VEX-encoded instruction on XMM register?

注意:我尝试过_mm256_castsi128_si256(),这会生成带有ymm操作数的指令.

Note: I tried _mm256_castsi128_si256(), and this generates instruction with ymm operand.

推荐答案

您有使用AVX的处理器.它没有XMM寄存器,只有YMM寄存器.如果您在具有AVX支持的情况下编译所有代码(例如,在GCC中使用-mavx或在MSVC中使用/arch:AVX),则所有SSE2代码都将在YMM寄存器的低128位上运行.不用担心.

You have a processor with AVX. It does not have XMM registers in only has YMM registers. If you compile all your code with AVX support (e.g. with -mavx in GCC or /arch:AVX in MSVC) then all your SSE2 code operates on the lower 128-bits of the YMM registers. There is nothing to worry about.

但是,假设您有两个不同的模块,一个是使用SSE2支持编译的(例如,在GCC中使用-msse2或在MSVC中使用/arch:SSE2),另一个是在AVX支持下编译的,并且您同时使用这两个函数,那么您确实有一些东西担心您何时在它们之间切换.在这种情况下,除非要提高性能,否则从AVX切换到SSE2代码时应调用_mm256_zeroupper() or _mm256_zeroall(). 使用AVX CPU指令:在没有"/arch:AVX"的情况下性能不佳

However, let's say you have two different modules one you compiled with SSE2 support (e.g. with -msse2 in GCC or /arch:SSE2 in MSVC) and the other with AVX support and you use functions from both then you do have something to worry about when you switch between them. In that case you should call _mm256_zeroupper() or _mm256_zeroall() when you switch from AVX to SSE2 code unless you want to take a performance hit. Using AVX CPU instructions: Poor performance without "/arch:AVX"

简单的解决方案是仅在具有AVX支持的情况下编译所有代码.我可以考虑编译具有不同指令集支持的不同模块的唯一原因是,如果您想制作一个CPU调度程序,以便您的代码可以在不同的处理器上运行.实施起来有点痛苦.但是然后您就不会进行状态更改,因此,我唯一想到的就是您需要担心状态更改的情况是,当您从共享库调用由另一指令集编译的函数(例如,由SSE2编译的DLL)时.在这种情况下,从AVX代码调用库函数时,可能需要调用_mm256_zeroupper() or _mm256_zeroall().

The simple solutions is to just compile all your code with AVX support. The only reason I can think of to compile different modules with different instruction set support is if you want to make a CPU dispatcher so your code can run on different processors. That's a bit of a pain to implement. But then you don't do state changes so the only time I can think of you need to worry about a state change is when you call functions from a shared library which were compiled with another instruction set (e.g. a DLL compiled with SSE2). In that case you may need to call _mm256_zeroupper() or _mm256_zeroall() when calling the library function from AVX code.

这篇关于在AVX内部函数中使用xmm参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆