在现代x86_64 CPU上,AVX / SSE乘数有多少时钟周期? [英] How many clock cycles does cost AVX/SSE exponentiation on modern x86_64 CPU?

查看:809
本文介绍了在现代x86_64 CPU上,AVX / SSE乘数有多少时钟周期?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

现代x86_64 CPU上的AVX / SSE乘数有多少时钟周期?

How many clock cycles does cost AVX/SSE exponentiation on modern x86_64 CPU?

我是: pow(x,y)= exp(y * log(x))

exp() log() AVX x86_64指令需要一定的已知周期数吗?

I.e. do both exp() and log() AVX x86_64 instructions require certain known number of cycles?

或者循环次数可以根据指数级别而变化,是否有最大循环次数可以进行代价幂运算?

Or the number of cycles may vary depending on the exponential degree, is there the maximum number of cycles can cost exponentiation?

推荐答案

x86 SIMD指令集(即不是x87)至少达到AVX2,不包括SIMD exp log pow ,除了 pow(x,0.5)平方根。

The x86 SIMD instruction set (i.e. not x87), at least up to AVX2, does not include SIMD exp, log, or pow with the exception of pow(x,0.5) which is the square root.

然而,有SIMD数学库,它们是从具有这些功能的SIMD指令构建的。英特尔的SVML包括:

There are SIMD math libraries however which are built from SIMD instructions which have these functions (among others). Intel's SVML includes:

__m256 _mm256_exp_ps(__m256)
__m256 _mm256_log_ps(__m256)
__m256 _mm256_pow_ps(__m256, __m256)

当Intel事实上具有多个指令时,Intel不知不觉地调用内联函数。 SVML是封闭源和昂贵的。但是,通过在安装Intel OpenCL运行时之后搜索svml,我在OpenCL目录中找到了一些svml文件,所以我认为你可以通过Intel的OpenCL运行时间接获得SVML。

which Intel disingenuously calls intrinsics when they are in fact functions with several instructions. SVML is closed source and expensive. However, by searching for svml after installing the Intel OpenCL runtime I found some svml files in the OpenCL directories so I think you can get SVML indirectly through Intel's OpenCL runtime.

AMD还提供了一个名为 LibM 的SIMD数学库,它是封闭源但是免费的,它也有自己的SIMD数学函数:

AMD also provides a SIMD math library called LibM, which is closed source but free, which also has its own SIMD math functions:

__m128 amd_vrs4_expf(__m128)
__m128 amd_vrs4_logf(__m128)
__m128 amd_vrs4_powf(__m128, __m128)

Agner Fog的 Vector类库提供了一个到SVML和LibM的接口。请参阅文件 vectormath_lib.h 。从这里你可以从SVML和LibM中找出相应的函数。

Agner Fog's Vector Class Library provides an interface to SVML and LibM. See the file vectormath_lib.h. From this you can figure out the corresponding functions from SVML and LibM.

Agner还为这些功能提供自己的代码,他声称与英特尔和AMD的专有版本具有竞争力。对于Agner的版本的函数,请查看 vectormath_exp.h 例如。查看 exp_f log_f pow_template_f

Agner also provides his own code for these functions which he claims is competitive with the proprietary Intel and AMD version. For Agner's version of the functions look in vectormath_exp.h e.g. look at exp_f, log_f, and pow_template_f and then look at the generated assembly.

您可以使用SVML,LibM和Agner自己的函数来计算 exp code> log 函数。但是,你应该知道SVML和LibM在其他硬件上不能正常工作。例如,AMD针对英特尔没有的FMA4进行了优化(但是AMD原计划拥有FMA4,然后在AMD已经计划用于FMA4之后突然改变为FMA3)。 英特尔似乎做了一些ummm ...我建议你阅读一下它

You can use SVML, LibM, and Agner's own functions to time the exp and log functions. However, you should know that SVML and LibM don't play well on the others hardware. AMD for example is optimized for FMA4 which Intel does not have (but Intel original planned to have FMA4 and then changed to FMA3 suddenly after AMD had already planned for FMA4). Intel appears to do something ummm...well I suggest you read about it.

因此,如果你分别在AMD或Intel处理器上运行SVML或LibM,你的性能可能会有非常不同的结果(无论你是否取代英特尔的CPU调度功能)。与GPU不同,x86指令集是公开提供的,因此您可以构建自己的 exp log 函数,已完成。

So if you time SVML or LibM on AMD or Intel processors respectively you will likely get very different results in performance (unless you manage to replace Intel's CPU dispatch function). Unlike GPUs the x86 instructions set is publicly available so you can build your own exp and log functions and that is what Agner has done.

更新

Glibc 2.22(应该很快就会出现)有一个向量数学库,名为 libmvec 。显然可以从 -O1 以及 -ffast-math -fopenmp 。我不知道为什么 fast-math 和OpenMP是必要的(特别是在下面的例子中,作为联想数学是不必要的),但最后有一个SIMD数学库GNU C标准库。

Glibc 2.22 (which should come out soon) has a vector math library called libmvec. Apparently it's enabled starting at -O1 along with -ffast-math and -fopenmp. I'm not sure why fast-math and OpenMP are necessary (particularly in the example below as associative math is not necessary) but it's great to finally have a SIMD math library in the GNU C standard library.

//gcc ./cos.c -O1 -fopenmp -ffast-math -lm -mavx2 
#include <math.h>

int N = 3200;
double b[3200];
double a[3200];

int main (void)
{
  int i;

  #pragma omp simd
  for (i = 0; i < N; i += 1)
  {
    b[i] = cos (a[i]);
  }

  return (0);
}

这篇关于在现代x86_64 CPU上,AVX / SSE乘数有多少时钟周期?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆