达斯分布.如何在正在计算的函数中获取任务密钥ID? [英] Dask-distributed. How to get task key ID in the function being calculated?
问题描述
我使用dask.distributed进行的计算包括创建中间文件,这些文件的名称包括UUID4,以标识该工作块.
My computations with dask.distributed include creation of intermediate files whose names include UUID4, that identify that chunk of work.
pairs = '{}\n{}\n{}\n{}'.format(list1, list2, list3, ...)
file_path = os.path.join(job_output_root, 'pairs',
'pairs-{}.txt'.format(str(uuid.uuid4()).replace('-', '')))
file(file_path, 'wt').writelines(pairs)
同时,dask分布式集群中的所有任务都具有唯一键.因此,使用该密钥ID作为文件名是很自然的.
In the same time, all tasks in the dask distributed cluster have unique keys. Therefore, it would be natural to use that key ID for file name.
有可能吗?
推荐答案
有两种方法可以解决此问题:
There are two ways to approach the problem:
- 您确定uuid并将其传递给Dask(已实现)
- Dask确定uuid并将其传递给您的函数(未实现,但有可能)
您将uuid传递给Dask
.submit
之类的函数接受 key =
关键字参数,您可以在其中指定要使用的密钥
You pass the uuid to Dask
Functions like .submit
accept a key=
keyword argument where you can specify the key that you want used
>>> e.submit(inc, 1, key='inc-12345')
<Future: status: pending, key: inc-12345>
类似地,dask.delayed函数支持 dask_key_name
关键字参数
Similarly dask.delayed functions support a dask_key_name
keyword argument
>>> value = delayed(inc)(1, dask_key_name='inc-12345')
您从Dask获得了钥匙
调度程序在执行每个任务期间将这样的上下文信息放入每个线程的全局变量中.从1.13版开始,该功能如下:
You get the key from Dask
The scheduler places contextual information like this into a per-thread global during the execution of each task. As of Version 1.13 this is available as follows:
def your_function(...):
from distributed.worker import thread_state
key = thread_state.key
future = e.submit(your_function, ...)
这篇关于达斯分布.如何在正在计算的函数中获取任务密钥ID?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!