如何使用IPython.parallel map()与生成器作为函数的输入 [英] How to use IPython.parallel map() with generators as input to function

查看:151
本文介绍了如何使用IPython.parallel map()与生成器作为函数的输入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用IPython.parallel地图。我希望并行化的函数的输入是生成器。由于大小/内存,我无法将生成器转换为列表。请参阅以下代码:

 来自itertools导入产品
来自IPython.parallel import客户端

c = Client()
v = c [:]
c.ids

def stringcount(longstring,substrings):
scount = [longstring.count(s)for s在子串中]
返回scount

substrings = product('abc',repeat = 2)
longstring = product('abc',repeat = 3)

#这就是我想并行做的事情
#我应该'为longstring中的longs'我使用range()因为它可以变长。
for num in range(10):
longs = longstring.next()
subs = substrings.next()
print(subs,longs)
count = stringcount(longs,subs)
print(count)

#这不起作用,我理解为什么。
#我不知道如何修复它,同时保持长串和子串为
#generators
v.map(stringcount,longstring,substrings)

for r在v:
print(r.get())


解决方案

您不能将 View.map 与生成器一起使用,而无需先遍历整个生成器。但是您可以编写自己的自定义函数来从生成器提交批量任务并逐步等待它们。我没有一个更有趣的例子,但我可以通过糟糕搜索的可怕实现来说明。



从我们的令牌'数据生成器'开始:

 来自math import sqrt 

def generate_possible_factors(N):
用于迭代的生成器N

的可能因素产生2,每个奇数整数< = sqrt(N)
如果N< = 3:
返回$ b,则为
$ b收益率2
f = 3
last = int(sqrt(N))
而f< = last:
收益率f
f + = 2

这只是生成一个整数序列,用于测试数字是否为素数。



现在我们将使用 IPython.parallel

  def is_factor(f,N):
是因子的N?
return(N%f)== 0

并使用完整的主要检查实施生成器和我们的因子函数:

  def dumb_prime(N):
dumb的实现是N prime ?
for f in generate_possible_factors(N):
if is_factor(f,N):
return False
return True

一次只提交有限数量任务的并行版本:

  def parallel_dumb_prime(N,v,max_outstanding = 10,dt = 0.1):
dumb_prime远程检查每个因子

最多`max_outstanding `因子将被并行检查。

一旦我们知道N不是素数,提交就会停止。

tasks = set()
#factors是一个生成器
因子= generate_possible_factors(N)
而True:
try:
#提交一批任务,其中i的最大值为max_outstanding
(max_outstanding-len(tasks)):
f = factors.next()
tasks.add (v.apply_async(is_factor,f,N))
除了StopIteration:
#没有更多要测试的因素,停止提交
break
#获取完成的任务
ready = set(task.ready()中的任务任务)
未准备就绪:
#等待一些任务完成
v.wait(任务, timeout = dt)
ready = set(task.ready()中任务的任务)

for ready:
#得到结果 - 如果为True,N不是素数,如果t.get():
返回Fals,我们就完成了
e
#将任务更新为只有那些仍在等待的人,
#并提交下一批
tasks.difference_update(ready)
#检查最后几个未完成的任务
用于任务中的任务:
如果t.get():
返回False
#检查所有候选人,没有因子,所以N是素数
返回True

这一次提交的任务数量有限,一旦我们知道N不是素数,我们就会停止使用此功能:

 来自IPython import parallel 

rc = parallel.Client()
view = rc.load_balanced_view()

for N in range(900,1000):
如果parallel_dumb_prime(N,view,10):
print N

更完整的插图<笔记本中的href =http://nbviewer.ipython.org/6203173 =nofollow>。


I am trying to use IPython.parallel map. The inputs to the function I wish to parallelize are generators. Because of size/memory it is not possible for me to convert the generators to lists. See code below:

from itertools import product
from IPython.parallel import Client

c = Client()
v = c[:]
c.ids

def stringcount(longstring, substrings):
    scount = [longstring.count(s) for s in substrings]
    return scount

substrings = product('abc', repeat=2)
longstring = product('abc', repeat=3)

# This is what I want to do in parallel
# I should be 'for longs in longstring' I use range() because it can get long.
for num in range(10): 
    longs = longstring.next()
    subs = substrings.next()
    print(subs, longs)
    count = stringcount(longs, subs)
    print(count)

# This does not work, and I understand why.
# I don't know how to fix it while keeping longstring and substrings as
# generators  
v.map(stringcount, longstring, substrings)

for r in v:
    print(r.get())

解决方案

You can't use View.map with a generator without walking through the entire generator first. But you can write your own custom function to submit batches of tasks from a generator and wait for them incrementally. I don't have a more interesting example, but I can illustrate with a terrible implementation of a prime search.

Start with our token 'data generator':

from math import sqrt

def generate_possible_factors(N):
    """generator for iterating through possible factors for N

    yields 2, every odd integer <= sqrt(N)
    """
    if N <= 3:
        return
    yield 2
    f = 3
    last = int(sqrt(N))
    while f <= last:
        yield f
        f += 2

This just generates a sequence of integers to use when testing if a number is prime.

Now our trivial function that we will use as a task with IPython.parallel

def is_factor(f, N):
    """is f a factor of N?"""
    return (N % f) == 0

and a complete implementation of prime check using the generator and our factor function:

def dumb_prime(N):
    """dumb implementation of is N prime?"""
    for f in generate_possible_factors(N):
        if is_factor(f, N):
            return False
    return True

A parallel version that only submits a limited number of tasks at a time:

def parallel_dumb_prime(N, v, max_outstanding=10, dt=0.1):
    """dumb_prime where each factor is checked remotely

    Up to `max_outstanding` factors will be checked in parallel.

    Submission will halt as soon as we know that N is not prime.
    """
    tasks = set()
    # factors is a generator
    factors = generate_possible_factors(N)
    while True:
        try:
            # submit a batch of tasks, with a maximum of `max_outstanding`
            for i in range(max_outstanding-len(tasks)):
                f = factors.next()
                tasks.add(v.apply_async(is_factor, f, N))
        except StopIteration:
            # no more factors to test, stop submitting
            break
        # get the tasks that are done
        ready = set(task for task in tasks if task.ready())
        while not ready:
            # wait a little bit for some tasks to finish
            v.wait(tasks, timeout=dt)
            ready = set(task for task in tasks if task.ready())

        for t in ready:
            # get the result - if True, N is not prime, we are done
            if t.get():
                return False
        # update tasks to only those that are still pending,
        # and submit the next batch
        tasks.difference_update(ready)
    # check the last few outstanding tasks
    for task in tasks:
        if t.get():
            return False
    # checked all candidates, none are factors, so N is prime
    return True

This submits a limited number of tasks at a time, and as soon as we know that N is not prime, we stop consuming the generator.

To use this function:

from IPython import parallel

rc = parallel.Client()
view = rc.load_balanced_view()

for N in range(900,1000):
    if parallel_dumb_prime(N, view, 10):
        print N

A more complete illustration in a notebook.

这篇关于如何使用IPython.parallel map()与生成器作为函数的输入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆