在具有不同体系结构的群集上应指定哪个AVX和行军? [英] Which AVX and march should be specified on a cluster with different architectures?

查看:94
本文介绍了在具有不同体系结构的群集上应指定哪个AVX和行军?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在尝试使用Intel编译器来编译用于HPC-Cluster的软件.登录节点是我编译和准备计算的地方,使用

I'm currently trying to compile software for the use on a HPC-Cluster using Intel compilers. The login-node, which is where I compile and prepare the computations uses Intel Xeon Gold 6148 Processors, while the compute nodes use either Haswell- (Intel Xeon E5-2660 v3 / Intel Xeon Processor E5-2680 v3) or Skylake-processors (Intel Xeon Gold 6138).

据我从上面的链接了解,我的登录节点支持Intel SSE4.2,Intel AVX,Intel AVX2和Intel AVX-512,但我的计算节点仅支持Intel AVX2(Haswell)或IntelAVX-512(Skylake)

As far as I understand from the links above, my login-node supports Intel SSE4.2, Intel AVX, Intel AVX2, as well as Intel AVX-512 but my compute nodes only support either Intel AVX2 (Haswell) or Intel AVX-512 (Skylake)

如果在登录节点上使用选项 -xHost 进行编译,则它应自动使用可用的最高指令集.但是哪一个是最高的呢?我如何确保我的程序在两个计算系统上都以最佳性能运行?我必须编译两个版本吗?额外的问题:在这种情况下,我必须指定哪个 -march ?

If I compile with the option -xHost on the login node, it should automatically use the highest instruction set available. But which one is the highest? And how can I ensure, that my program runs on both compute-systems with best performance? Do I have to compile two versions? Bonus question: Which -march do I have to specify in this case?

推荐答案

由于使用的是Intel编译器,因此可以使用其自动处理器分派" 功能来创建"fat通用二进制文件,其中同时包含SSE兼容,AVX兼容等版本.因此,当您在仅SSE的计算机上运行胖"二进制文件时,将仅执行二进制文件的SSE优化部分(代码路径).当您在AVX机器上运行SAME"fat"二进制文件时,将执行二进制文件的AVX优化部分.这是非常强大的功能,并不是众所周知的功能.

Since you are using Intel Compiler, you can use its "Automatic Processor Dispatch" capability in order to create "fat" generic binaries, which contain both SSE-compatible , AVX-compatible and so on versions altogether. So when you run your "fat" binary on SSE-only machine, then only SSE-optimized part (codepath) of your binary will be executed. When you run the SAME "fat" binary on AVX machine, then AVX-optimized part of your binary will be executed. This is very powerful and not so well known feature.

您可以结合使用 -ax -x Intel Compiler编译标志来启用它.这个想法是,基本上,您可以通过-ax指定最高的ISA,并通过-x指定默认/最低"的ISA.

You can eanble it using combination of -ax and -x Intel Compiler compilation flags. The idea is that basically you specify the highest ISA(s) via -ax and the default/"lowest" ISA via -x.

中简要介绍了"-ax"胖二进制技术https://www.chpc.utah.edu/documentation/software/single-executable.php#submit

更多详细信息,请参见第9页上的给定的铝箔甲板: https://www.alcf.anl.gov/files/ken_intel_compiler_optimization.pdf

More details can be found at page 9 of given nice foil-deck: https://www.alcf.anl.gov/files/ken_intel_compiler_optimization.pdf

最后,我要提到的是,在您的描述中,您对ISA的关系有些困惑.带有AVX512的Intel x86处理器-将始终支持AVX2.AVX2机器将始终支持SSE. supersupersimplified 的解释是:AVX512是AVX/AVX2的超集,而AVX/AVX2可以看作是SSE的超集(事实上不是,但SSE始终是在AVX机器上可用,但反之则不行.

Finally, I should mention, that in your description you've slightly confused ISAs relationship. Intel x86 processors with AVX512 - will always be supporting AVX2. AVX2 machines will always support SSE. The super oversimplified explanation of that : AVX512 is kinda super-set of AVX/AVX2, while AVX/AVX2 can be seen as a super set of SSE (de facto it is not, but still SSE is always available on AVX machines, but not vice versa).

无论您提到的是哪种情况,Haswell(AVX2机器,因此SSE都在板中,但这里自然没有AVX512)和Skylake(AVX512机器,因此AVX2和SSE在板中).因此,您可能需要-axCORE-AVX512 -xCORE-AVX2之类的东西(列表中没有AVX2以下的机器-即,没有SSE或AVX(1)机器).您似乎只有Skylake服务器和Haswell服务器.

Whatever the case you've mentioned Haswell (AVX2 machine, so SSE is in board, but naturally no AVX512 here) and Skylake (AVX512 machine, so AVX2 and SSE are on board). Therefore you probably need something like -axCORE-AVX512 -xCORE-AVX2 (in your list there is no machines below AVX2 - ie no SSE or AVX(1) machines). You seem to only have Skylake server and Haswell server.

这篇关于在具有不同体系结构的群集上应指定哪个AVX和行军?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆