Python大迭代次数失败 [英] Python large iterations number fail

查看:164
本文介绍了Python大迭代次数失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用多处理模块在Python中编写了简单的 monte-carloπ能量计算程序。
它工作正常,但是当我为每个工作者传递1E + 10次迭代时,会出现一些问题,结果是错误的。我不明白是什么问题,因为1E + 9迭代的一切都很好!

I wrote simple monte-carlo π calculation program in Python, using multiprocessing module. It works just fine, but when I pass 1E+10 iterations for each worker, some problem occur, and the result is wrong. I cant understand what is the problem, because everything is fine on 1E+9 iterations!

import sys
from multiprocessing import Pool
from random import random


def calculate_pi(iters):
    """ Worker function """

    points = 0  # points inside circle

    for i in iters:
        x = random()
        y = random()

        if x ** 2 + y ** 2 <= 1:
            points += 1

    return points


if __name__ == "__main__":

    if len(sys.argv) != 3:
        print "Usage: python pi.py workers_number iterations_per_worker"
        exit()

    procs = int(sys.argv[1])
    iters = float(sys.argv[2])  # 1E+8 is cool

    p = Pool(processes=procs)

    total = iters * procs
    total_in = 0

    for points in p.map(calculate_pi, [xrange(int(iters))] * procs):
        total_in += points

    print "Total: ", total, "In: ", total_in
    print "Pi: ", 4.0 * total_in / total


推荐答案

问题似乎是多处理对可以传递给xrange内的子进程的最大int有限制。这是一个快速测试:

The problem seems to be that multiprocessing has a limit to the largest int it can pass to subprocesses inside an xrange. Here's a quick test:

import sys
from multiprocessing import Pool
def doit(n):
  print n
if __name__ == "__main__":
  procs = int(sys.argv[1])
  iters = int(float(sys.argv[2]))
  p = Pool(processes=procs)
  for points in p.map(doit, [xrange(int(iters))] * procs):
    pass

现在:

$ ./multitest.py 2 1E8
xrange(100000000)
xrange(100000000)
$ ./multitest.py 2 1E9
xrange(1000000000)
xrange(1000000000)
$ ./multitest.py 2 1E10
xrange(1410065408)
xrange(1410065408)

这是多处理的一个更普遍的问题的一部分:它依赖于标准的Python pickle,有一些小的(并没有很好记录的)扩展来传递值。每当出现问题时,首先要检查的是值是按照预期的方式到达。

This is part of a more general problem with multiprocessing: It relies on standard Python pickling, with some minor (and not well documented) extensions to pass values. Whenever things go wrong, the first thing to check is that the values are arriving the way you expected.

事实上,你可以通过玩<$ c来看到这个问题$ c> pickle ,甚至没有触及多处理(情况并非总是如此,因为那些次要扩展,但通常是这样):

In fact, you can see this problem by playing with pickle, without even touching multiprocessing (which isn't always the case, because of those minor extensions, but often is):

>>> pickle.dumps(xrange(int(1E9)))
'c__builtin__\nxrange\np0\n(I0\nI1000000000\nI1\ntp1\nRp2\n.'
>>> pickle.dumps(xrange(int(1E10)))
'c__builtin__\nxrange\np0\n(I0\nI1410065408\nI1\ntp1\nRp2\n.'

即使不了解泡菜协议的所有细节,显然 I1000000000 在第一种情况下是1E9作为int,而下一种情况的等效块大约是1.41E9,而不是1E10,作为int。你可以试验

Even without learning all the details of the pickle protocol, it should be obvious that the I1000000000 in the first case is 1E9 as an int, while the equivalent chunk of the next case is about 1.41E9, not 1E10, as an int. You can experiment

尝试的一个明显的解决方案是传递 int(iters)而不是 xrange(int(iters)) ,并让 calculate_pi 从其参数创建 xrange 。(注意:在某些情况下)像这样的明显转变可能会损害性能,可能会很糟糕。但在这种情况下,如果有任何事情可能会稍微好一点 - 一个更简单的对象可以通过,而你就是para增加 xrange 构造 - 当然差异很小,可能无关紧要。只要确保在盲目改造前思考。)

One obvious solution to try is to pass int(iters) instead of xrange(int(iters)), and let calculate_pi create the xrange from its argument. (Note: In some cases an obvious transformation like this can hurt performance, maybe badly. But in this case, it's probably slightly better if anything—a simpler object to pass, and you're parallelizing the xrange construction—and of course the difference is so tiny it probably won't matter. Just make sure to think before blindly transforming.)

快速测试显示现在可行:

And a quick test shows that this now works:

import sys
from multiprocessing import Pool

def doit(n):
  print xrange(n)

if __name__ == "__main__":
    procs = int(sys.argv[1])
    iters = int(float(sys.argv[2]))
    p = Pool(processes=procs)
    for points in p.map(doit, [iters] * procs):
      pass

然后:

$ ./multitest.py 2 1E10
xrange(10000000000)
xrange(10000000000)

但是,您仍会遇到更大的限制:

However, you will still run into a larger limit:

$ ./multitest.py 2 1E100
OverflowError: Python int too large to convert to C long

同样,这是同样的基本问题。解决这个问题的一种方法是将arg一直作为字符串传递,并在子进程内执行int(float(a))。

Again, it's the same basic problem. One way to solve that is to pass the arg all the way down as a string, and do the int(float(a)) inside the subprocesses.

作为一方注意:我正在做的原因是 iters = int(float(sys.argv [2]))而不仅仅是 iters = float(sys。 argv [2])然后使用 int(iters)以后是为了避免意外使用float iters 值稍后(如OP的版本所示,计算总计,因此 total_in / total )。

As a side note: The reason I'm doing iters = int(float(sys.argv[2])) instead of just iters = float(sys.argv[2]) and then using int(iters) later is to avoid accidentally using the float iters value later on (as the OP's version does, in computing total and therefore total_in / total).

请记住,如果你得到足够多的数字,你会遇到C双重类型的限制: 1E23 通常为99999999999999991611392,而不是100000000000000000000000。

And keep in mind that if you get to big enough numbers, you run into the limits of the C double type: 1E23 is typically 99999999999999991611392, not 100000000000000000000000.

这篇关于Python大迭代次数失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆