C中的Vectorized Trig函数? [英] Vectorized Trig functions in C?

查看:169
本文介绍了C中的Vectorized Trig函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在计算高度并行化的trig函数(像1024块),我想利用现代架构所具有的至少一些并行性。

当我为(int i = 0; i< SIZE; i ++){$}编译一个块时,

  b $ b arr [i] = sin((float)i / 1024); 
}

GCC不会对它进行矢量化,并且表示

 未矢量化:不支持相关stmt:D.3068_39 = __builtin_sinf(D.3069_38); 

这对我有意义。但是,我想知道是否有一个库可以执行并行触发计算。



只需简单的泰勒级数就可以达到第11级,GCC将矢量化所有循环,我获得的速度是无辜的罪恶循环的两倍(具有比特精确的答案,或者9阶系列,1600个值中的最后两个只有一个比特关闭,对于大于3倍的加速)。我确信有人遇到过这样的问题,但是当我谷歌,我发现没有提及任何图书馆或类似的。



A。是否有已经存在的东西?

B.如果不是,优化并行trig函数的建议?编辑:我发现下面的库叫做SLEEF : http://shibatch.sourceforge.net/ 这篇文章,并使用SIMD指令来计算几个基本功能。它使用SSE和AVX特定的代码,但我认为它不会很难将其转化为标准C循环。

我的答案是创建我自己的库来完成这个称为vectrig的工作: https://github.com/jeremysalwen/vectrig


I'm looking to calculate highly parallelized trig functions (in block of like 1024), and I'd like to take advantage of at least some of the parallelism that modern architectures have.

When I compile a block

for(int i=0; i<SIZE; i++) {
   arr[i]=sin((float)i/1024);
}

GCC won't vectorize it, and says

not vectorized: relevant stmt not supported: D.3068_39 = __builtin_sinf (D.3069_38);

Which makes sense to me. However, I'm wondering if there's a library to do parallel trig computations.

With just a simple taylor series up the 11th order, GCC will vectorize all the loops, and I'm getting speeds over twice as fast as a naive sin loop (with bit-exact answers, or with 9th order series, only a single bit off for the last two out of 1600 values, for a >3x speedup). I'm sure someone has encountered a problem like this before, but when I google, I find no mentions of any libraries or the like.

A. Is there something existing already?
B. If not, advice for optimizing parallel trig functions?

EDIT: I found the following library called "SLEEF": http://shibatch.sourceforge.net/ which is described in this paper and uses SIMD instructions to calculate several elementary functions. It uses SSE and AVX specific code, but I don't think it will be hard to turn it into standard C loops.

解决方案

My answer was to create my own library to do exactly this called vectrig: https://github.com/jeremysalwen/vectrig

这篇关于C中的Vectorized Trig函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆