为python多处理池中的worker获取唯一的ID [英] Get a unique ID for worker in python multiprocessing pool

查看:216
本文介绍了为python多处理池中的worker获取唯一的ID的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有一种方法可以为python多处理池中的每个工作器分配唯一的ID,从而使池中特定工作器正在运行的作业可以知道哪个工作器正在运行它?根据文档,Process具有name

Is there a way to assign each worker in a python multiprocessing pool a unique ID in a way that a job being run by a particular worker in the pool could know which worker is running it? According to the docs, a Process has a name but

名称是仅用于标识目的的字符串.它没有 语义.多个进程可以使用相同的名称.

The name is a string used for identification purposes only. It has no semantics. Multiple processes may be given the same name.

对于我的特定用例,我想在一组四个GPU上运行一堆作业,并且需要设置该作业应在其上运行的GPU的设备号.由于作业的长度不一致,因此我想确保在上一个作业完成之前,尝试在其上运行的作业在GPU上没有冲突(因此这排除了将ID预先分配给工作单元).

For my particular use-case, I want to run a bunch of jobs on a group of four GPUs, and need to set the device number for the GPU that the job should run on. Because the jobs are of non-uniform length, I want to be sure that I don't have a collision on a GPU of a job trying to run on it before the previous one completes (so this precludes pre-assigning an ID to the unit of work ahead of time).

推荐答案

似乎您想要的很简单:multiprocessing.current_process().例如:

It seems like what you want is simple: multiprocessing.current_process(). For example:

import multiprocessing

def f(x):
    print multiprocessing.current_process()
    return x * x

p = multiprocessing.Pool()
print p.map(f, range(6))

输出:

$ python foo.py 
<Process(PoolWorker-1, started daemon)>
<Process(PoolWorker-2, started daemon)>
<Process(PoolWorker-3, started daemon)>
<Process(PoolWorker-1, started daemon)>
<Process(PoolWorker-2, started daemon)>
<Process(PoolWorker-4, started daemon)>
[0, 1, 4, 9, 16, 25]

这将返回流程对象本身,因此流程可以是其自己的标识.您还可以在其上调用id以获取唯一的数字ID-在cpython中,这是进程对象的内存地址,因此我不认为有重叠的可能.最后,您可以使用流程的identpid属性-但这仅在流程启动后设置.

This returns the process object itself, so the process can be its own identity. You could also call id on it for a unique numerical id -- in cpython, this is the memory address of the process object, so I don't think there's any possibility of overlap. Finally, you can use the ident or the pid property of the process -- but that's only set once the process is started.

此外,查看源代码,在我看来,自动生成的名称(如上述Process repr字符串中的第一个值所示)很有可能是唯一的. multiprocessing为每个进程维护一个itertools.counter对象,该对象用于生成 _identity 元组,用于产生它的任何子进程.因此,顶层进程会生成具有单值ID的子进程,而它们会生成具有二值ID的子进程,依此类推.然后,如果没有名称传递给Process构造函数,则只需自动生成(使用':'.join(...)),根据_identity命名.然后Pool 使用replace更改进程的名称. ,使自动生成的ID保持不变.

Furthermore, looking over the source, it seems to me very likely that autogenerated names (as exemplified by the first value in the Process repr strings above) are unique. multiprocessing maintains an itertools.counter object for every process, which is used to generate an _identity tuple for any child processes it spawns. So the top-level process produces child process with single-value ids, and they spawn process with two-value ids, and so on. Then, if no name is passed to the Process constructor, it simply autogenerates the name based on the _identity, using ':'.join(...). Then Pool alters the name of the process using replace, leaving the autogenerated id the same.

所有这一切的结果是,尽管两个Process es 可能具有相同的名称,但是由于您 may 在创建它们时会为其分配相同的名称,如果您不触摸name参数,则它们是唯一的.同样,从理论上讲,您可以使用_identity作为唯一标识符.但我发现他们将这个变量设为私有是有原因的!

The upshot of all this is that although two Processes may have the same name, because you may assign the same name to them when you create them, they are unique if you don't touch the name parameter. Also, you could theoretically use _identity as a unique identifier; but I gather they made that variable private for a reason!

上述操作的一个示例:

import multiprocessing

def f(x):
    created = multiprocessing.Process()
    current = multiprocessing.current_process()
    print 'running:', current.name, current._identity
    print 'created:', created.name, created._identity
    return x * x

p = multiprocessing.Pool()
print p.map(f, range(6))

输出:

$ python foo.py 
running: PoolWorker-1 (1,)
created: Process-1:1 (1, 1)
running: PoolWorker-2 (2,)
created: Process-2:1 (2, 1)
running: PoolWorker-3 (3,)
created: Process-3:1 (3, 1)
running: PoolWorker-1 (1,)
created: Process-1:2 (1, 2)
running: PoolWorker-2 (2,)
created: Process-2:2 (2, 2)
running: PoolWorker-4 (4,)
created: Process-4:1 (4, 1)
[0, 1, 4, 9, 16, 25]

这篇关于为python多处理池中的worker获取唯一的ID的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆