初始化dask分布的工人的状态 [英] Initializing state on dask-distributed workers
问题描述
我正在尝试做类似的事情
I am trying to do something like
resource = MyResource()
def fn(x):
something = dosemthing(x, resource)
return something
client = Client()
results = client.map(fn, data)
问题在于,resource
不可序列化,并且构造成本很高.
因此,我想在每个工人上构造一次,并且可供fn
使用.
The issue is that resource
is not serializable and is expensive to construct.
Therefore I would like to construct it once on each worker and be available to be used by fn
.
我该怎么做?
还是有其他方法可以使resource
在所有工作人员中可用?
How do I do this?
Or is there some other way to make resource
available on all workers?
推荐答案
您总是可以构造一个惰性资源,例如
You can always construct a lazy resource, something like
class GiveAResource():
resource = [None]
def get_resource(self):
if self.resource[0] is None:
self.resource[0] = MyResource()
return self.resource[0]
这样的一个实例可以很好地在进程之间进行序列化,因此您可以将其作为要在worker上执行的任何函数的输入,然后在其上调用.get_resource()
将获得您的本地昂贵资源(该资源将在任何稍后出现的工人.
An instance of this will serialise between processes fine, so you can include it as an input to any function to be executed on workers, and then calling .get_resource()
on it will get your local expensive resource (which will get remade on any worker which appears later on).
最好在模块而不是动态代码中定义此类.
This class would be best defined in a module rather than dynamic code.
这里没有锁定,因此,如果到目前为止没有用过多个线程同时请求资源,那么您将获得多余的工作.
There is no locking here, so if several threads ask for the resource at the same time when it has not been needed so far, you will get redundant work.
这篇关于初始化dask分布的工人的状态的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!