初始化dask分布的工人的状态 [英] Initializing state on dask-distributed workers

查看:114
本文介绍了初始化dask分布的工人的状态的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试做类似的事情

I am trying to do something like

resource = MyResource()
def fn(x):
   something = dosemthing(x, resource)
   return something

client = Client()
results = client.map(fn, data)

问题在于,resource不可序列化,并且构造成本很高. 因此,我想在每个工人上构造一次,并且可供fn使用.

The issue is that resource is not serializable and is expensive to construct. Therefore I would like to construct it once on each worker and be available to be used by fn.

我该怎么做? 还是有其他方法可以使resource在所有工作人员中可用?

How do I do this? Or is there some other way to make resource available on all workers?

推荐答案

您总是可以构造一个惰性资源,例如

You can always construct a lazy resource, something like

class GiveAResource():
    resource = [None]
    def get_resource(self):
        if self.resource[0] is None:
            self.resource[0] = MyResource()
        return self.resource[0]

这样的一个实例可以很好地在进程之间进行序列化,因此您可以将其作为要在worker上执行的任何函数的输入,然后在其上调用.get_resource()将获得您的本地昂贵资源(该资源将在任何稍后出现的工人.

An instance of this will serialise between processes fine, so you can include it as an input to any function to be executed on workers, and then calling .get_resource() on it will get your local expensive resource (which will get remade on any worker which appears later on).

最好在模块而不是动态代码中定义此类.

This class would be best defined in a module rather than dynamic code.

这里没有锁定,因此,如果到目前为止没有用过多个线程同时请求资源,那么您将获得多余的工作.

There is no locking here, so if several threads ask for the resource at the same time when it has not been needed so far, you will get redundant work.

这篇关于初始化dask分布的工人的状态的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆