如何使用joblib.Parallel()返回生成器? [英] How to return a generator using joblib.Parallel()?
问题描述
下面有一段代码,其中 joblib.Parallel()
返回一个列表.
I have a piece of code below where the joblib.Parallel()
returns a list.
import numpy as np
from joblib import Parallel, delayed
lst = [[0.0, 1, 2], [3, 4, 5], [6, 7, 8]]
arr = np.array(lst)
w, v = np.linalg.eigh(arr)
def proj_func(i):
return np.dot(v[:,i].reshape(-1, 1), v[:,i].reshape(1, -1))
proj = Parallel(n_jobs=-1)(delayed(proj_func)(i) for i in range(len(w)))
如何使用列表joblib.Parallel()
返回生成器?
Instead of a list, how do I return a generator using joblib.Parallel()
?
我已经更新了@ user3666197在下面的注释中建议的代码.
I have updated the code as suggested by @user3666197 in comments below.
import numpy as np
from joblib import Parallel, delayed
lst = [[0.0, 1, 2], [3, 4, 5], [6, 7, 8]]
arr = np.array(lst)
w, v = np.linalg.eigh(arr)
def proj_func(i):
yield np.dot(v[:,i].reshape(-1, 1), v[:,i].reshape(1, -1))
proj = Parallel(n_jobs=-1)(delayed(proj_func)(i) for i in range(len(w)))
但是我收到此错误:
TypeError: can't pickle generator objects
我错过了什么吗?我该如何解决?我的主要收获是减少内存,因为proj
会变得非常大,所以我只想一次调用列表中的每个生成器.
Am I missing something? How do I fix this? My main gain here is to reduce memory as proj
can get very large, so I would just like to call each generator in the list one at a time.
推荐答案
Q :如何使用
joblib.Parallel
返回生成器?"
Q : "how do I return a generator using
joblib.Parallel
?"
鉴于 joblib
的目的和实现,着重于使用一组衍生的独立进程来分发代码执行单元(是的,其动力来自从中央GIL逃脱而来的提高的性能语法构造函数称为 joblib.Parallel(...)( delayed()(...) )
的[SERIAL]
跳舞一个GIL-step-after-另一个GIL-step-after -... joblib
组装(不受控制)到列表中的em>.
Given the joblib
purpose and implementation, focused on distributing code-execution units, using a set of spawned, independent processes ( yes, motivated by a boosted performance from an escape from a central GIL-lock re-[SERIAL]
-ised dancing one-GIL-step-after-another-GIL-step-after-... ) made by the syntactic constructor known as joblib.Parallel(...)( delayed()(...) )
, my, obviously limited imagination, tells me, the maximum achievable is but to make the "remotely" executed processes to return back to main the requested generator(s) that are joblib
-assembled ( out of one's control ) into a list.
因此,在上述初始条件和给定功能的前提下,可以实现的最大值是接收生成器列表,而不是任何形式的延迟执行,并在返回时作为生成器进行包装strong> fun()
,设置为通过 delayed( fun )(...)
注入到joblib.Parallel( n_jobs = ... )
-许多 远程" -进程的确会这样做.
So an achievable maximum is to receive a list of generators, not any form of a deferred-execution, wrapped on return as a generator, given the above set of initial conditions and given the function fun()
, set to be injected via the delayed( fun )(...)
into the joblib.Parallel( n_jobs = ... )
-many "remote"-processes, will indeed do so.
如果我们确实是纯粹的学徒主义者,那么只有
> 接收到"a(one)生成器的机会,而 n_jobs
仅需要,它将在词汇上和逻辑上满足定义的目标 -返回(但)(一个)生成器- ,但效率和意义不如把钱扔进尼罗河...
If we were indeed pedantic purists, the only chance to receive but "a ( one ) generator using
joblib.Parallel()
", for that to happen then_jobs
would need to be just== 1
, which lexically and logically will meet the defined goal --to return (but) a (one) generator--, yet would be less efficient and less meaningful, than throwing money into the river of Nile...
这篇关于如何使用joblib.Parallel()返回生成器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!