cpu调度程序为AVX和SSE的视觉工作室 [英] cpu dispatcher for visual studio for AVX and SSE

查看:416
本文介绍了cpu调度程序为AVX和SSE的视觉工作室的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用两台电脑。一个没有AVX支持,一个有AVX。这将是方便的让我的代码找到我的CPU在运行时支持的指令集,并选择适当的代码路径。
我遵循Agner Fog的建议,使CPU调度程序( http:// www .agner.org / optimize /#vectorclass )。然而,在我的机器没有AVX编译和链接与visual studio的代码与AVX启用导致代码崩溃时,我运行它。



例如,我有两个源文件,一个是用一些SSE2指令定义的SSE2指令集,另一个是用定义的AVX指令集和一些AVX指令。在我的主要功能,如果我只引用SSE2功能,代码仍然崩溃凭借任何源代码与AVX启用和AVX指令。任何有关我如何解决这个问题的线索?



编辑:
好​​吧,我想我孤立的问题。我使用Agner Fog的向量类,我已经定义了三个源文件为:

  //文件sse2.cpp  - / arch:SSE2 
#includevectorclass.h
float func_sse2(const float * a){
Vec8f v1 = Vec8f()。
float sum = horizo​​ntal_add(v1);
return sum;
}
//文件avx.cpp - 使用/ arch编译:AVX
#includevectorclass.h
float func_avx(const float * a){
Vec8f v1 = Vec8f()。load(a);
float sum = horizo​​ntal_add(v1);
return sum;
}
//文件foo.cpp - 用/ arch:SSE2
编译#include< stdio.h>
extern float func_sse2(const float * a);
extern float func_avx(const float * a);
int main(){
float(* fp)(const float * a);
float a [] = {1,2,3,4,5,6,7,8};
int iset = 6;
if(iset> = 7){
fp = func_avx;
}
else {
fp = func_sse2;
}
float sum =(* fp)(a);
printf(sum%f \\\
,sum);
}

这会崩溃。如果我在func_SSE2中使用Vec4f它不会崩溃。我不明白这一点。我可以使用Vec8f与SSE2本身,只要我没有另一个源文件与AVX。 Agner Fog的手册指出:



使用256位浮点矢量类(Vec8f,
Vec4d)没有优势,除非指定了AVX指令集,但是如果在使用和不使用AVX的情况下使用相同的源代码,可以方便地使用
这些类。
当编译$时,每个256位向量将简单地分成两个128位向量b $ b没有AVX。



然而,当我有两个源文件与Vec8f一个编译与SSE2和一个编译与AVX然后我得到一个崩溃。 >

Edit2:
我可以通过命令行工作。

 > cl -c sse2.cpp 
> cl -c / arch:AVX avx.cpp
> cl foo.cpp sse2.obj avx.obj
> foo。 exe

Edit3:
但这会导致崩溃

 > cl -c sse2.cpp 
> cl -c / arch:AVX avx.cpp
> cl foo.cpp avx .obj sse2.obj
> foo.exe

另一个线索。显然,链接事项的顺序。如果avx.obj在sse2.obj之前,它崩溃,但如果sse2.obj在avx.obj之前,它不会崩溃。我不知道如果它选择正确的代码路径(我现在没有访问我的AVX系统),但至少它不会崩溃。

解决方案

我意识到这是一个老问题,并且问他的人似乎不再在周围,但我昨天遇到同样的问题。



编译时,sse2.cpp和avx.cpp文件产生的目标文件不仅包含你的函数,还包含任何所需的模板函数。
(例如 Vec8f :: load )这些模板函数也使用请求的指令集进行编译。



这意味着你的sse2.obj和avx.obj对象文件都将包含使用相应指令集编译的 Vec8f :: load 的定义。



但是,由于编译器将 Vec8f :: load 视为外部可见的,因此它将对象文件的一个COMDAT 'selectany'(又称pick any)标签。这告诉链接器,如果它看到这个符号的多个定义,例如在2个不同的目标文件中,那么允许它选择它喜欢的任何一个。 (这样做可以减少最终可执行文件中的重复代码,否则会因为模板和内联函数的多重定义而大小不一。)



与之直接相关的是传递给链接器的目标文件的顺序影响它选择哪一个。



如果这是avx.obj,那么AVX编译版本的 Vec8F :: load 将始终使用。这将在不支持该指令集的机器上崩溃。
另一方面,如果sse2.obj是第一个,那么将总是使用SSE2编译版本。这不会崩溃,但它只会使用SSE2指令,即使支持AVX。



这是情况可以看到,如果你看看链接器'map'文件输出(使用/ map选项生成)。以下是相关的(编辑的)摘录 -

  // 
// link with sse2.obj before avx.obj
//
0001:00000080 _main foo.obj
0001:00000330 func_sse2 @@ YAMPBM @ Z sse2.obj
0001: 00000420 ?? 0Vec256fe @@ QAE @ XZ sse2.obj
0001:00000440?0Vec4f @@ QAE @ ABT__m128 @@@ Z sse2.obj
0001:00000470?0Vec8f @@ QAE @ XZ sse2 .obj < - sse2版本使用
0001:00000490 ?? BVec4f @@ QBE?AT__m128 @@ XZ sse2.obj
0001:000004c0?get_high @ Vec8f @@ QBE?AVVec4f @@ XZ sse2 .obj
0001:000004f0?get_low @ Vec8f @@ QBE?AVVec4f @@ XZ sse2.obj
0001:00000520?load @ Vec8f @@ QAEAAV1 @ PBM @ Z sse2.obj <-sse2使用的版本
0001:00000680?func_avx @@ YAMPBM @ Z avx.obj
0001:00000740 ?? BVec8f @@ QBE?AT__m256 @@ XZ avx.obj

//
//在sse2.obj之前链接avx.obj
//
0001:00000080 _main foo.obj
0001:00000270?func_avx @@ YAMPBM @ Z avx.obj
0001:00000330?0Vec8f @@ QAE @ XZ avx.obj < - avx版本使用
0001:00000350 ?? BVec8f @@ QBE?AT__m256 @@ XZ avx.obj
0001 :00000380?load @ Vec8f @@ QAEAAV1 @ PBM @ Z avx.obj < - avx version used
0001:00000580?func_sse2 @@ YAMPBM @ Z sse2.obj
0001:00000670?0Vec256fe @@ QAE @ XZ sse2.obj
0001:00000690?0Vec4f @@ QAE @ ABT__m128 @@@ Z sse2.obj
0001:000006c0 ?? BVec4f @@ QBE?AT__m128 @@ XZ sse2。 obj
0001:000006f0?get_high @ Vec8f @@ QBE?AVVec4f @@ XZ sse2.obj
0001:00000720?get_low @ Vec8f @@ QBE?AVVec4f @@ XZ sse2.obj

至于固定它,这是另一回事。在这种情况下,以下钝化的黑客应该通过强制avx版本有自己不同命名的版本的模板函数工作。这将增加生成的可执行文件大小,因为它将包含相同函数的多个版本,即使sse2和avx版本是相同的。

  // avx.cpp 
namespace AVXWrapper {
\#includevectorclass.h
}
使用命名空间AVXWrapper;

float func_avx(const float * a)
{
...
}

但是有一些重要的限制 -
(a)如果包含的文件管理任何形式的全局状态,它将不再是真正的全局,因为你将有2' -global'版本和
(b),你将无法将vectorclass变量作为参数在avx.cpp中定义的其他代码和函数之间传递。


I work with two computers. One without AVX support and one with AVX. It would be convenient to have my code find the instruction set supported by my CPU at run-time and choose the appropriate code path. I've follow the suggestions by Agner Fog to make a CPU dispatcher (http://www.agner.org/optimize/#vectorclass). However, on my maching ithout AVX compiling and linking with visual studio the code with AVX enabled causes the code to crash when I run it.

I mean for example I have two source files one with the SSE2 instruction set defined with some SSE2 instructions and another one with the AVX instruction set defined and with some AVX instructions. In my main function if I only reference the SSE2 functions the code still crashes by virtue of having any source code with AVX enabled and with AVX instructions. Any clues to how I can fix this?

Edit: Okay, I think I isolated the problem. I'm using Agner Fog's vector class and I have defined three source files as:

//file sse2.cpp - compiled with /arch:SSE2
#include "vectorclass.h"
float func_sse2(const float* a) {
    Vec8f v1 = Vec8f().load(a);
    float sum = horizontal_add(v1);
    return sum;
}
//file avx.cpp - compiled with /arch:AVX
#include "vectorclass.h"
float func_avx(const float* a) {
    Vec8f v1 = Vec8f().load(a);
    float sum = horizontal_add(v1);
    return sum;
}
//file foo.cpp - compiled with /arch:SSE2
#include <stdio.h>
extern float func_sse2(const float* a);
extern float func_avx(const float* a);
int main() {
    float (*fp)(const float*a); 
    float a[] = {1,2,3,4,5,6,7,8};
    int iset = 6;
    if(iset>=7) { 
        fp = func_avx;  
    }
    else { 
        fp = func_sse2;
    }
    float sum = (*fp)(a);
    printf("sum %f\n", sum);
}

This crashes. If I instead use Vec4f in func_SSE2 it does not crash. I don't understand this. I can use Vec8f with SSE2 by itself as long as I don't have another source file with AVX. Agner Fog's manual says

"There is no advantage in using the 256-bit floating point vector classes (Vec8f, Vec4d) unless the AVX instruction set is specified, but it can be convenient to use these classes anyway if the same source code is used with and without AVX. Each 256-bit vector will simply be split up into two 128-bit vectors when compiling without AVX."

However, when I have two source files with Vec8f one compiled with SSE2 and one compiled with AVX then I get a crash.

Edit2: I can get it to work from the command line

>cl -c sse2.cpp
>cl -c /arch:AVX avx.cpp
>cl foo.cpp sse2.obj avx.obj
>foo.exe

Edit3: This, however, crashes

>cl -c sse2.cpp
>cl -c /arch:AVX avx.cpp
>cl foo.cpp avx.obj sse2.obj
>foo.exe

Another clue. Apparently, the order of linking matters. It crashes if avx.obj is before sse2.obj but if sse2.obj is before avx.obj it does not crash. I'm not sure if it chooses the correct code path (I don't have access to my AVX system right now) but at least it does not crash.

解决方案

I realise that this is an old question and that the person who asked it appears to be no longer around, but I hit the same problem yesterday. Here's what I worked out.

When compiled both your sse2.cpp and avx.cpp files produce object files that not only contain your function but also any required template functions. (e.g. Vec8f::load) These template functions are also compiled using the requested instruction set.

The means that your sse2.obj and avx.obj object files will both contain definitions of Vec8f::load each compiled using the respective instruction sets.

However, since the compiler treats Vec8f::load as externally visible, it puts it a 'COMDAT' section of the object file with a 'selectany' (aka 'pick any') label. This tells the linker that if it sees multiple definitions of this symbol, for example in 2 different object files, then it is allowed to pick any one it likes. (It does this to reduce duplicate code in the final executable which otherwise would be inflated in size by mutliple definitions of template and inline functions.)

The problem you are having is directly related to this in that the order of the object files passed to the linker is affecting which one it picks. Specifically here, it appears to be picking the first definition it sees.

If this was avx.obj then the AVX compiled version of Vec8F::load will always be used. This will crash on a machine that doesn't support that instruction set. On the other hand if sse2.obj is first then the SSE2 compiled version will always be used. This won't crash but it will only use SSE2 instructions even if AVX is supported.

That this is the case can be seen if you look at the linker 'map' file output (produced using the /map option.) Here are the relevant (edited) excerpts -

//
// link with sse2.obj before avx.obj
//
0001:00000080  _main                             foo.obj
0001:00000330  func_sse2@@YAMPBM@Z               sse2.obj
0001:00000420  ??0Vec256fe@@QAE@XZ               sse2.obj
0001:00000440  ??0Vec4f@@QAE@ABT__m128@@@Z       sse2.obj
0001:00000470  ??0Vec8f@@QAE@XZ                  sse2.obj <-- sse2 version used
0001:00000490  ??BVec4f@@QBE?AT__m128@@XZ        sse2.obj
0001:000004c0  ?get_high@Vec8f@@QBE?AVVec4f@@XZ  sse2.obj
0001:000004f0  ?get_low@Vec8f@@QBE?AVVec4f@@XZ   sse2.obj
0001:00000520  ?load@Vec8f@@QAEAAV1@PBM@Z        sse2.obj <-- sse2 version used
0001:00000680  ?func_avx@@YAMPBM@Z               avx.obj
0001:00000740  ??BVec8f@@QBE?AT__m256@@XZ        avx.obj

//
// link with avx.obj before sse2.obj
//
0001:00000080  _main                             foo.obj
0001:00000270  ?func_avx@@YAMPBM@Z               avx.obj
0001:00000330  ??0Vec8f@@QAE@XZ                  avx.obj <-- avx version used
0001:00000350  ??BVec8f@@QBE?AT__m256@@XZ        avx.obj
0001:00000380  ?load@Vec8f@@QAEAAV1@PBM@Z        avx.obj <-- avx version used
0001:00000580  ?func_sse2@@YAMPBM@Z              sse2.obj
0001:00000670  ??0Vec256fe@@QAE@XZ               sse2.obj
0001:00000690  ??0Vec4f@@QAE@ABT__m128@@@Z       sse2.obj
0001:000006c0  ??BVec4f@@QBE?AT__m128@@XZ        sse2.obj
0001:000006f0  ?get_high@Vec8f@@QBE?AVVec4f@@XZ  sse2.obj
0001:00000720  ?get_low@Vec8f@@QBE?AVVec4f@@XZ   sse2.obj

As for fixing it, that's another matter. In this case, the following blunt hack should work by forcing the avx version to have its own differently named versions of the template functions. This will increase the resulting executable size as it will contain multiple versions of the same function even if the sse2 and avx versions are identical.

// avx.cpp
namespace AVXWrapper {
\#include "vectorclass.h"
}
using namespace AVXWrapper;

float func_avx(const float* a)
{
    ...
}

There are some important limitations though - (a) if the included file manages any form of global state it will no longer be truly global as you will have 2 'semi-global' versions, and (b) you won't be able to pass vectorclass variables as parameters between other code and functions defined in avx.cpp.

这篇关于cpu调度程序为AVX和SSE的视觉工作室的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆