获取python多处理池中worker的唯一ID [英] Get a unique ID for worker in python multiprocessing pool

查看:16
本文介绍了获取python多处理池中worker的唯一ID的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有办法为 python 多处理池中的每个工作人员分配一个唯一的 ID,以便池中特定工作人员运行的作业可以知道哪个工作人员正在运行它?根据文档, Process 有一个 name 但是

Is there a way to assign each worker in a python multiprocessing pool a unique ID in a way that a job being run by a particular worker in the pool could know which worker is running it? According to the docs, a Process has a name but

名称是一个仅用于识别目的的字符串.它没有语义.多个进程可以被赋予相同的名称.

The name is a string used for identification purposes only. It has no semantics. Multiple processes may be given the same name.

对于我的特定用例,我想在一组四个 GPU 上运行一堆作业,并且需要为应该运行作业的 GPU 设置设备号.因为作业的长度不均匀,所以我想确保在前一个作业完成之前尝试在 GPU 上运行的作业不会在 GPU 上发生冲突(因此这排除了将 ID 预先分配给工作单元提前).

For my particular use-case, I want to run a bunch of jobs on a group of four GPUs, and need to set the device number for the GPU that the job should run on. Because the jobs are of non-uniform length, I want to be sure that I don't have a collision on a GPU of a job trying to run on it before the previous one completes (so this precludes pre-assigning an ID to the unit of work ahead of time).

推荐答案

看起来你想要的很简单:multiprocessing.current_process().例如:

It seems like what you want is simple: multiprocessing.current_process(). For example:

import multiprocessing

def f(x):
    print multiprocessing.current_process()
    return x * x

p = multiprocessing.Pool()
print p.map(f, range(6))

输出:

$ python foo.py 
<Process(PoolWorker-1, started daemon)>
<Process(PoolWorker-2, started daemon)>
<Process(PoolWorker-3, started daemon)>
<Process(PoolWorker-1, started daemon)>
<Process(PoolWorker-2, started daemon)>
<Process(PoolWorker-4, started daemon)>
[0, 1, 4, 9, 16, 25]

这会返回进程对象本身,因此进程可以是它自己的身份.您也可以在其上调用 id 以获得唯一的数字 id ——在 cpython 中,这是进程对象的内存地址,所以我不认为有任何可能性的重叠.最后,您可以使用进程的 identpid 属性——但这仅在进程启动后设置.

This returns the process object itself, so the process can be its own identity. You could also call id on it for a unique numerical id -- in cpython, this is the memory address of the process object, so I don't think there's any possibility of overlap. Finally, you can use the ident or the pid property of the process -- but that's only set once the process is started.

此外,查看源代码,在我看来,自动生成的名称(如上面 Process repr 字符串中的第一个值所示)很可能是唯一的.multiprocessing 为每个进程维护一个 itertools.counter 对象,用于生成 _identity 元组用于它产生的任何子进程.因此顶级进程产生具有单值 id 的子进程,它们产生具有双值 id 的进程,依此类推.然后,如果没有名称传递给 Process 构造函数,它只是 使用 ':'.join(...) 根据 _identity 自动生成名称.然后 Pool 更改名称使用 replace 处理,自动生成的 id 保持不变.

Furthermore, looking over the source, it seems to me very likely that autogenerated names (as exemplified by the first value in the Process repr strings above) are unique. multiprocessing maintains an itertools.counter object for every process, which is used to generate an _identity tuple for any child processes it spawns. So the top-level process produces child process with single-value ids, and they spawn process with two-value ids, and so on. Then, if no name is passed to the Process constructor, it simply autogenerates the name based on the _identity, using ':'.join(...). Then Pool alters the name of the process using replace, leaving the autogenerated id the same.

这一切的结果是虽然两个Processes可能有相同的名字,因为你可能给它们分配了相同的名字创建它们时,如果您不触摸 name 参数,它们是唯一的.此外,理论上您可以使用 _identity 作为唯一标识符;但我认为他们将这个变量设为私有是有原因的!

The upshot of all this is that although two Processes may have the same name, because you may assign the same name to them when you create them, they are unique if you don't touch the name parameter. Also, you could theoretically use _identity as a unique identifier; but I gather they made that variable private for a reason!

上面的一个例子:

import multiprocessing

def f(x):
    created = multiprocessing.Process()
    current = multiprocessing.current_process()
    print 'running:', current.name, current._identity
    print 'created:', created.name, created._identity
    return x * x

p = multiprocessing.Pool()
print p.map(f, range(6))

输出:

$ python foo.py 
running: PoolWorker-1 (1,)
created: Process-1:1 (1, 1)
running: PoolWorker-2 (2,)
created: Process-2:1 (2, 1)
running: PoolWorker-3 (3,)
created: Process-3:1 (3, 1)
running: PoolWorker-1 (1,)
created: Process-1:2 (1, 2)
running: PoolWorker-2 (2,)
created: Process-2:2 (2, 2)
running: PoolWorker-4 (4,)
created: Process-4:1 (4, 1)
[0, 1, 4, 9, 16, 25]

这篇关于获取python多处理池中worker的唯一ID的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆