快速算法计算出圆周率并行 [英] Fast algorithm to calculate Pi in parallel

查看:591
本文介绍了快速算法计算出圆周率并行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我开始学习CUDA,我觉得计算隆皮数字将是一个不错的,介绍的项目。

I am starting to learn CUDA and I think calculating long digits of pi would be a nice, introductory project.

我已经实现了简单的蒙特卡洛方法,它是很容易并行化,能。我只是让每个线程随机对单位正方形产生点,找出使用还原操作在单位圆中的许多谎言,以及如何从中总结出的结果。

I have already implemented the simple Monte Carlo method which is easily parallelize-able. I simply have each thread randomly generate points on the unit square, figure out how many lie within the unit circle, and tally up the results using a reduction operation.

但是,这肯定不是最快的算法计算常数。以前,当我做了一个单线程CPU此练习中,我使用梅钦类公式来执行计算远远快收敛。对于那些有兴趣,这涉及到EX pressing pi作为arctangents的总和,用泰勒级数来评估前pression。

But that is certainly not the fastest algorithm for calculating the constant. Before, when I did this exercise on a single threaded CPU, I used Machin-like formulae to do the calculation for far faster convergence. For those interested, this involves expressing pi as the sum of arctangents and using Taylor series to evaluate the expression.

这样一个公式的一个例子:

An example of such a formula:

不幸的是,我发现,这种并行技术来成千上万的GPU线程是不容易的。的问题是,大多数的操作的简单操作的高precision数学而不是做浮点运算的数据的长矢量

Unfortunately, I found that parallelizing this technique to thousands of GPU threads is not easy. The problem is that the majority of the operations are simply doing high precision math as opposed to doing floating point operations on long vectors of data.

所以,我想知道,什么是计算在GPU上的任意长的圆周率数字的最有效方法是什么?

推荐答案

您应该使用<一个href="http://en.wikipedia.org/wiki/Bailey%E2%80%93Borwein%E2%80%93Plouffe_formula">Bailey–Borwein–Plouffe公式

为什么呢?首先,你需要一个算法,它可以分解。所以,这来到我想到的第一件事是有圆周率是一个无限的总和再presentation。然后,每个处理器只计算一个任期,并为您总结他们都在最后。

Why? First of all, you need an algorithm that can be broken down. So, the first thing that came to my mind is having a representation of pi as an infinite sum. Then, each processor just computes one term, and you sum them all in the end.

然后,它是preferable每个处理器操纵小型$ P ​​$ pcision值,而不是非常高precision的。例如,如果你想要一个十亿小数,并且使用了一些用在这里 ,像楚德诺夫斯基算法,你的每个处理器需要操纵十亿长的数字。这根本就不是一个GPU适当的方法。

Then, it is preferable that each processor manipulates small-precision values, as opposed to very high precision ones. For example, if you want one billion decimals, and you use some of the expressions used here, like the Chudnovsky algorithm, each of your processor will need to manipulate a billion long number. That's simply not the appropriate method for a GPU.

因此​​,所有的一切,BBP公式可以让你单独计算圆周率的数字(该算法是非常酷),并与低precision处理器!阅读BBP数字提取算法π

So, all in all, the BBP formula will allow you to compute the digits of pi separately (the algorithm is very cool), and with "low precision" processors! Read the "BBP digit-extraction algorithm for π"

优点计算π   这种算法计算π,而不需要有数千甚至数百万的数字自定义数据类型。该方法计算的第n个位数,而不计算前n - 1个数字,并能使用小型,高效的数据类型。   该算法是计算第n个位(或第n个的邻域的几个位)以最快的方式,但使用大的数据类型的π计算算法保持当目标是计算所有的数字从1到n更快。

Advantages of the BBP algorithm for computing π This algorithm computes π without requiring custom data types having thousands or even millions of digits. The method calculates the nth digit without calculating the first n − 1 digits, and can use small, efficient data types. The algorithm is the fastest way to compute the nth digit (or a few digits in a neighborhood of the nth), but π-computing algorithms using large data types remain faster when the goal is to compute all the digits from 1 to n.

这篇关于快速算法计算出圆周率并行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆