在python中并行遍历单个列表 [英] iterating over a single list in parallel in python

查看:380
本文介绍了在python中并行遍历单个列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目标是同时使用 builtin sum & map函数在 parallel 中对单个iter进行计算.也许使用itertools(而不是经典的for loops)来分析通过iterator ...

The objective is to do calculations on a single iter in parallel using builtin sum & map functions concurrently. Maybe using (something like) itertools instead of classic for loops to analyze (LARGE) data that arrives via an iterator...

在一个简单的示例情况下,我想计算ilen, sum_x & sum_x_sq:

In one simple example case I want to calculate ilen, sum_x & sum_x_sq:

ilen,sum_x,sum_x_sq=iterlen(iter),sum(iter),sum(map(lambda x:x*x, iter))

但不将(large) iter转换为list(与iter=list(iter)一样)

But without converting the (large) iter to a list (as with iter=list(iter))

n.b.是否使用sum & map和不使用for loops,是否使用itertools和/或threading模块?

n.b. Do this using sum & map and without for loops, maybe using the itertools and/or threading modules?

def example_large_data(n=100000000, mean=0, std_dev=1):
  for i in range(n): yield random.gauss(mean,std_dev)

-编辑-

非常特定:我正在仔细研究itertools,希望有一个像map这样的双重功能可以做到.例如:len_x,sum_x,sum_x_sq=itertools.iterfork(iter_x,iterlen,sum,sum_sq)

Being VERY specific: I was taking a good look at itertools hoping that there was a dual function like map that could do it. For example: len_x,sum_x,sum_x_sq=itertools.iterfork(iter_x,iterlen,sum,sum_sq)

如果我要非常具体:我只是在寻找一个答案,那就是"iterfork"过程的python源代码.

If I was to be very very specific: I am looking for just one answer, python source code for the "iterfork" procedure.

推荐答案

您可以使用itertools.tee将单个迭代器变成三个迭代器,然后可以将其传递给三个函数.

You can use itertools.tee to turn your single iterator into three iterators which you can pass to your three functions.

iter0, iter1, iter2 = itertools.tee(input_iter, 3)
ilen, sum_x, sum_x_sq = count(iter0),sum(iter1),sum(map(lambda x:x*x, iter2))

可以运行 ,但是内置函数sum(在Python 2中为map)不是以支持并行迭代的方式实现的.您调用的第一个函数将完全消耗其迭代器,第二个函数将消耗第二个迭代器,然后第三个函数将消耗第三个迭代器.由于tee必须存储其输出迭代器之一看到的值,但不能存储所有其他迭代器看到的值,因此从本质上讲,这与从迭代器创建列表并将其传递给每个函数相同.

That will work, but the builtin function sum (and map in Python 2) is not implemented in a way that supports parallel iteration. The first function you call will consume its iterator completely, then the second one will consume the second iterator, then the third function will consume the third iterator. Since tee has to store the values seen by one of its output iterators but not all of the others, this is essentially the same as creating a list from the iterator and passing it to each function.

现在,如果使用生成器函数,则对于每个输出值,它们仅消耗其输入中的单个值,则可以使用zip进行并行迭代工作.在Python 3中,mapzip都是生成器.问题是如何使sum成为生成器.

Now, if you use generator functions that consume only a single value from their input for each value they output, you might be able to make parallel iteration work using zip. In Python 3, map and zip are both generators. The question is how to make sum into a generator.

我认为您可以使用 (已在Python 3.2中添加).它是一个生成其输入的总和的生成器.这是解决问题的方法(我假设您的count函数应该是len的迭代器友好版本):

I think you can get pretty much what you want by using itertools.accumulate (which was added in Python 3.2). It is a generator that yields a running sum of its input. Here's how you could make it work for your problem (I'm assuming your count function was supposed to be an iterator-friendly version of len):

iter0, iter1, iter2 = itertools.tee(input_iter, 3)

len_gen = itertools.accumulate(map(lambda x: 1, iter0))
sum_gen = itertools.accumulate(iter1)
sum_sq_gen = itertools.accumulate(map(lambda x: x*x, iter2))

parallel_gen = zip(len_gen, sum_gen, sum_sq_gen)  # zip is a generator in Python 3

for ilen, sum_x, sum_x_sq in parallel_gen:
    pass    # the generators do all the work, so there's nothing for us to do here

# ilen_x, sum_x, sum_x_sq have the right values here!

如果您使用的是Python 2,而不是3,则必须编写自己的accumulate生成器函数(我上面链接的文档中有一个纯Python实现),并使用itertools.imap,而不是内置的mapzip函数.

If you're using Python 2, rather than 3, you'll have to write your own accumulate generator function (there's a pure Python implementation in the docs I linked above), and use itertools.imap and itertools.izip rather than the builtin map and zip functions.

这篇关于在python中并行遍历单个列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆