如何使用joblib.Parallel()返回生成器? [英] How to return a generator using joblib.Parallel()?

查看:338
本文介绍了如何使用joblib.Parallel()返回生成器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面有一段代码,其中 joblib.Parallel() 返回一个列表.

I have a piece of code below where the joblib.Parallel() returns a list.

import numpy as np
from joblib import Parallel, delayed

lst = [[0.0, 1, 2], [3, 4, 5], [6, 7, 8]]
arr = np.array(lst)
w, v = np.linalg.eigh(arr)

def proj_func(i):
    return np.dot(v[:,i].reshape(-1, 1), v[:,i].reshape(1, -1))

proj = Parallel(n_jobs=-1)(delayed(proj_func)(i) for i in range(len(w)))

如何使用列表joblib.Parallel()返回生成器?

Instead of a list, how do I return a generator using joblib.Parallel()?

我已经更新了@ user3666197在下面的注释中建议的代码.

I have updated the code as suggested by @user3666197 in comments below.

import numpy as np
from joblib import Parallel, delayed

lst = [[0.0, 1, 2], [3, 4, 5], [6, 7, 8]]
arr = np.array(lst)
w, v = np.linalg.eigh(arr)

def proj_func(i):
    yield np.dot(v[:,i].reshape(-1, 1), v[:,i].reshape(1, -1))

proj = Parallel(n_jobs=-1)(delayed(proj_func)(i) for i in range(len(w)))

但是我收到此错误:

TypeError: can't pickle generator objects

我错过了什么吗?我该如何解决?我的主要收获是减少内存,因为proj会变得非常大,所以我只想一次调用列表中的每个生成器.

Am I missing something? How do I fix this? My main gain here is to reduce memory as proj can get very large, so I would just like to call each generator in the list one at a time.

推荐答案

Q :如何使用joblib.Parallel返回生成器?"

Q : "how do I return a generator using joblib.Parallel?"

鉴于 joblib 的目的和实现,着重于使用一组衍生的独立进程来分发代码执行单元(是的,其动力来自从中央GIL逃脱而来的提高的性能语法构造函数称为 joblib.Parallel(...)( delayed()(...) ) -lock re- [SERIAL]跳舞一个GIL-step-after-另一个GIL-step-after -... ,我的想象力显然有限,他告诉我,最大可实现的方法是使 远程" 执行的过程返回到所请求的生成器中.将joblib组装(不受控制)到列表中的em>.

Given the joblib purpose and implementation, focused on distributing code-execution units, using a set of spawned, independent processes ( yes, motivated by a boosted performance from an escape from a central GIL-lock re-[SERIAL]-ised dancing one-GIL-step-after-another-GIL-step-after-... ) made by the syntactic constructor known as joblib.Parallel(...)( delayed()(...) ), my, obviously limited imagination, tells me, the maximum achievable is but to make the "remotely" executed processes to return back to main the requested generator(s) that are joblib-assembled ( out of one's control ) into a list.

因此,在上述初始条件和给定功能的前提下,可以实现的最大值是接收生成器列表,而不是任何形式的延迟执行,并在返回时作为生成器进行包装strong> fun() ,设置为通过 delayed( fun )(...) 注入到joblib.Parallel( n_jobs = ... )-许多 远程" -进程的确会这样做.

So an achievable maximum is to receive a list of generators, not any form of a deferred-execution, wrapped on return as a generator, given the above set of initial conditions and given the function fun(), set to be injected via the delayed( fun )(...) into the joblib.Parallel( n_jobs = ... )-many "remote"-processes, will indeed do so.

如果我们确实是纯粹的学徒主义者,那么只有> 接收到"a(one)生成器的机会,而n_jobs仅需要,它将在词汇上和逻辑上满足定义的目标 -返回(但)(一个)生成器- ,但效率和意义不如把钱扔进尼罗河...

If we were indeed pedantic purists, the only chance to receive but "a ( one ) generator using joblib.Parallel()", for that to happen the n_jobs would need to be just == 1, which lexically and logically will meet the defined goal --to return (but) a (one) generator--, yet would be less efficient and less meaningful, than throwing money into the river of Nile...

这篇关于如何使用joblib.Parallel()返回生成器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆