快速算法并行计算Pi [英] Fast algorithm to calculate Pi in parallel

查看:916
本文介绍了快速算法并行计算Pi的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我开始学习CUDA,我认为计算pi的长位数将是一个很好的介绍性项目。



我已经实现了简单的蒙特卡罗方法容易并行化。我简单地让每个线程在单位平方上随机生成点,计算出单位圆内有多少个点,然后使用简化运算来计算结果。



肯定不是用于计算常数的最快的算法。之前,当我在单线程CPU上执行此练习时,我使用



为什么?首先,你需要一个可以分解的算法。所以,我想到的第一件事是将pi表示为一个无限和。然后,每个处理器只计算一个项,并且最后将它们总和。



然后,最好每个处理器处理小精度值,而不是非常高精度的。例如,如果您想要10亿个小数,并使用此处 a>,像 Chudnovsky算法一样,每个处理器都需要处理十亿长数字。这不是GPU的适当方法。



所以,总而言之,BBP公式将允许你单独计算pi的数字(算法非常酷),并与低精度处理器!阅读用于π的BBP数字提取算法


BBP算法的优点计算π
此算法计算π,而不需要具有数千甚至数百万位数字的自定义数据类型。该方法计算第n个数字而不计算前n-1个数字,并且可以使用小而有效的数据类型
该算法是计算第n个数字(或在第n个邻域中的几个数字)的最快方法,但是当目标是计算从1开始的所有数字时,使用大数据类型的π计算算法更快到n。



I am starting to learn CUDA and I think calculating long digits of pi would be a nice, introductory project.

I have already implemented the simple Monte Carlo method which is easily parallelize-able. I simply have each thread randomly generate points on the unit square, figure out how many lie within the unit circle, and tally up the results using a reduction operation.

But that is certainly not the fastest algorithm for calculating the constant. Before, when I did this exercise on a single threaded CPU, I used Machin-like formulae to do the calculation for far faster convergence. For those interested, this involves expressing pi as the sum of arctangents and using Taylor series to evaluate the expression.

An example of such a formula:

Unfortunately, I found that parallelizing this technique to thousands of GPU threads is not easy. The problem is that the majority of the operations are simply doing high precision math as opposed to doing floating point operations on long vectors of data.

So I'm wondering, what is the most efficient way to calculate arbitrarily long digits of pi on a GPU?

解决方案

You should use the Bailey–Borwein–Plouffe formula

Why? First of all, you need an algorithm that can be broken down. So, the first thing that came to my mind is having a representation of pi as an infinite sum. Then, each processor just computes one term, and you sum them all in the end.

Then, it is preferable that each processor manipulates small-precision values, as opposed to very high precision ones. For example, if you want one billion decimals, and you use some of the expressions used here, like the Chudnovsky algorithm, each of your processor will need to manipulate a billion long number. That's simply not the appropriate method for a GPU.

So, all in all, the BBP formula will allow you to compute the digits of pi separately (the algorithm is very cool), and with "low precision" processors! Read the "BBP digit-extraction algorithm for π"

Advantages of the BBP algorithm for computing π This algorithm computes π without requiring custom data types having thousands or even millions of digits. The method calculates the nth digit without calculating the first n − 1 digits, and can use small, efficient data types. The algorithm is the fastest way to compute the nth digit (or a few digits in a neighborhood of the nth), but π-computing algorithms using large data types remain faster when the goal is to compute all the digits from 1 to n.

这篇关于快速算法并行计算Pi的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆