OpenCL pi计算 [英] OpenCL pi calculation

查看:155
本文介绍了OpenCL pi计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有人可以指导我编写使用OpenCL来计算pi编号的代码.如果您有任何Pi计算器示例代码,请与我分享.感谢您的浓厚兴趣. class ="h2_lin">解决方案

第一个问题是多少位小数?" 1000000、10000000、100000000、1000000000 ...?这将影响算法的选择.

然后是并行化算法的问题.可能只是通过并行化扩展精度算术原语(+,-,*,/,sqrt).也许已经完成了...


然后并行化扩展精度算法是没有意义的.

我建议使用Machin公式:

16 arctg(1/5) - 4 arctg(1/239)

使用McLaurin展开对arctg进行评估的地方

arctg(x) = Sum (-1)^i x^(2i+1)/(2i+1).

(通过递归计算x的幂.)

20位几乎可以由64位整数(定点)处理,但是您将需要更多.使用三个32位整数,您可以放心.为了方便起见,您可以使用算术基数10000000000.

您需要实现一个小整数的长加,乘和除运算.给定较小的操作数长度,使用高效的乘法算法(例如唐津(Karatsuba))可能毫无用处.

有关并行化的一些提示:

-让每个处理器计算一系列连续项;每个处理器将需要以x的某个幂开始(对于N中的处理器k,幂将为2kN + 1、2kN + 3、2kN + 5 ...),因此需要快速功率计算(通过平方)来初始化. br/>
-或者,处理器每隔N个项累加一次(N个处理器k的幂2k + 1、2N + 2k + 1、4N + 2k + 1、6N + 2k + 1 ...),每次乘以x ^ 2N .

在Python的Machin公式的非常粗糙的实现下,浮点数:

默认ArcTg(X):
    总和= X
    项= X
    Y =-X * X

     for  in 范围( 3  17  2 ):
        期限* = Y
        Sum + =期限/I

    返回总和

打印 16  * ArcTg( 1 ./ 5 )- 4  * ArcTg( 1 ./ 239 )


请访问我的网站@ http://domemtech. com/?p = 669 [ ^ ]我的博客仅使用您的示例对CUDA与OpenCL进行了粗略的比较,估算了pi的值.

基本上,解决方案是通过使用Composite Simpson规则的数值积分进行的.该解决方案使用IEEE 754单精度浮点,因此精度非常有限.

如果您有任何疑问,或者无法运行它,请告诉我.

肯·多米诺(Ken Domino)


Is there anybody who can give me a direction to write a code that calculate the pi number by using OpenCL..If you have any Pi calculator sample code, please share with me..Thanks for your great interest..

解决方案

The first question is "How many decimal places?" 1000000, 10000000, 100000000, 1000000000... ? This will influence the choice of an algorithm.

Then there is the issue of parallelizing the algorithm. Possibly by just by parallelizing the extended-precision arithmetic primitives (+, -, *, /, sqrt). Maybe has this already been done...


Then parallelizing the extended-precision arithmetic is pointless.

I recommend using the Machin formula:

16 arctg(1/5) - 4 arctg(1/239),

where the arctg are evaluated using McLaurin expansion

arctg(x) = Sum (-1)^i x^(2i+1)/(2i+1).

(compute the powers of x by recurrence.)

20 digits can nearly be handled by a 64 bits integer (fixed-point), but you''ll need a bit more. With three 32 bits integers, you are on the safe side. For convenience, you can work in arithmetic base 10000000000.

You''ll need to implement long addition, multiplication, and division by a small integer. Given the small operand length, it is probably worthless to use efficient multiplication algorithms (like Karatsuba).

Some hint for parallelization:

- let every processor compute a range of consecutive terms; every processor will need to start at some power of x (powers will be 2kN+1, 2kN+3, 2kN+5... for processor k among N) hence the need for fast power computation (by squarings) to initialize.

- alternatively, a processor accumulates every N other terms (powers 2k+1, 2N+2k+1, 4N+2k+1, 6N+2k+1... for processor k among N), multiplying every time by x^2N.

Below a very crude implementation of Machin''s formula in Python, floating-point:

def ArcTg(X):
    Sum= X
    Term= X
    Y= - X * X

    for I in range(3, 17, 2):
        Term*= Y
        Sum+= Term / I

    return Sum

print 16 * ArcTg(1. / 5) - 4 * ArcTg(1. / 239)


Please visit my web site @ http://domemtech.com/?p=669[^] This page from my blog is a cursory comparison of CUDA vs. OpenCL using just your example, estimating the value of pi.

Basically, the solution is via numerical integration using the Composite Simpson''s Rule. The solution uses IEEE 754 single-precision float points, so it is very limited in precision.

If you have questions, or cannot get it to run, please let me know.

Ken Domino


这篇关于OpenCL pi计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆