用变量设置Dask worker [英] Setting up Dask worker with variable

查看:153
本文介绍了用变量设置Dask worker的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在工作人员加载时分配一个更大的对象(或从磁盘加载),并将其放入全局变量(例如 calib_data )。

I would like to distribute a larger object (or load from disk) when a worker loads and put it into a global variable (such as calib_data). Does that work with dask workers?

推荐答案

类似于客户端方法 register_worker_callbacks 在这种情况下可以做您想要的事情。您仍然需要 somewhere 来放置变量,因为在python中没有真正的全局范围。例如,该位置可以是导入模块的任何属性,然后,任何工作人员都可以访问。您也可以将其添加为worker实例本身的属性,但我认为没有明显的理由要这样做。

Seems like the client method register_worker_callbacks can do what you want in this case. You will still need somewhere to put your variable, since in python there is no truly global scope. That somewhere could be any attribute of an imported module, for example, which, then, any worker would have access to. You could also add it as an attribute of the worker instance itself, but I see no obvious reason to want to do that.

一种可行的方法是,劫持随机选择的内置模块但我不特别推荐这样做(见下文)

One way which works, hijacking a randomly picked builtin module; but I do not particularly recommend this (see below)

def attach_var(name, value):
    import re
    re.__setattr__(name, value)

client.run(attach_var, 'x', 1)

def use_var():
    # any function running on a worker can do this, via delayed or
    # whatever method you pass with
    import re
    return re.x

client.run(use_var)

在继续之前,您是否已经考虑过 delayed(calib_data) scatter ,这会将您的变量复制到所需的位置,例如,

Before going ahead, though, have you already considered delayed(calib_data) or scatter, which will copy your variable to where its needed, e.g.,

futures = client.scatter(calib_data, broadcast=True)

或确实使用普通的 delayed 语义

dcalib = dask.delayed(load_calib_data)()
work = dask.delayed(process_stuff)(dataset1, dcalib)

这篇关于用变量设置Dask worker的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆