Python多处理-池中的进程数是否会因错误而减少? [英] Python multiprocessing - does the number of processes in a pool decrease on error?

查看:57
本文介绍了Python多处理-池中的进程数是否会因错误而减少?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

代码:

import multiprocessing
print(f'num cpus {multiprocessing.cpu_count():d}')
import sys; print(f'Python {sys.version} on {sys.platform}')

def _process(m):
    print(m) #; return m
    raise ValueError(m)

args_list = [[i] for i in range(1, 20)]

if __name__ == '__main__':
    with multiprocessing.Pool(2) as p:
        print([r for r in p.starmap(_process, args_list)])

打印:

num cpus 8
Python 3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 03:13:28) 
[Clang 6.0 (clang-600.0.57)] on darwin
1
7
4
10
13
16
19
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 47, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
  File "/Users/ubik-mac13/Library/Preferences/PyCharm2018.3/scratches/multiprocess_error.py", line 8, in _process
    raise ValueError(m)
ValueError: 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/ubik-mac13/Library/Preferences/PyCharm2018.3/scratches/multiprocess_error.py", line 18, in <module>
    print([r for r in p.starmap(_process, args_list)])
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 298, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 683, in get
    raise self._value
ValueError: 1

Process finished with exit code 1

将池中的进程数增加到3或4会打印所有奇数(可能是乱序):

Increasing the number of processes in the pool to 3 or 4 prints all the odd numbers (possibly out of order):

1
3
5
9
11
7
13
15
17
19

从5开始时,它会打印1-19的所有范围.那么这里发生了什么?多次失败后,进程会崩溃吗?

while from 5 and above it prints all the range 1-19. So what happens here? Do the processes crash after a number of failures?

这当然是一个玩具示例,但是它来自我遇到的一个实际代码问题-离开了多处理池运行了几天的时间后,cpu的使用率稳定下降,就好像某些进程被杀死了(请注意,cpu利用率下降了) 03/04和03/06,尽管仍有许多任务要运行):

This is a toy example of course but it comes from a real code issue I had - having left a multiprocessing pool run for some days steadily the cpu use went down as if some processes were killed (note the cpu utilization going downhill on 03/04 and 03/06 while there was still lots of tasks to be run):

当代码终止时,它向我展示了一个(并且这里只有一个,而过程还很多).multiprocessing.pool.RemoteTraceback-奖励问题是此回溯是随机的吗?在此玩具示例中,通常为ValueError: 1,但有时还会出现其他数字.多处理程序是否保留了崩溃的第一个进程的第一个回溯?

When the code terminated it presented me with one (and one only as here, while the processes were many more) multiprocessing.pool.RemoteTraceback - bonus question is this traceback random? In this toy example, it is usually ValueError: 1 but sometimes also other numbers appear. Does multiprocessing keep the first traceback from the first process that crashes?

推荐答案

不,只是整个任务崩溃,而不是进程本身.您在玩具示例中观察到的行为可以用工人数量和可迭代长度的组合所得到的块大小来解释.当您从此处抓取功能calc_chunksize_info时,您会看到生成的块大小的差异:

No, just a whole task blows up, not the process itself. Your observed behavior in your toy-example is explainable with the resulting chunksizes for the combination of the number of workers and the length of the iterable. When you grab the function calc_chunksize_info from here you can see the difference in the resulting chunksizes:

calc_chunksize_info(n_workers=2, len_iterable=20)
# Chunkinfo(n_workers=2, len_iterable=20, n_chunks=7, chunksize=3, last_chunk=2)

calc_chunksize_info(n_workers=5, len_iterable=20)
# Chunkinfo(n_workers=5, len_iterable=20, n_chunks=20, chunksize=1, last_chunk=1) 

如果块大小大于1,则所有未触摸的"taskels"(1.定义:Taskel)第一个Taskel引发异常后,任务也将丢失.直接在目标函数中处理可预期的异常,或编写其他包装以进行错误处理以防止这种情况.

In case the chunksize will be > 1, all untouched "taskels" (1. Definitions: Taskel) within a task are also lost, as soon the first taskel raises an exception. Handle expectable exceptions directly within your target-function or write an additional wrapper for error-handling to prevent that.

当代码终止时,它向我显示了一个multiprocessing.pool.RemoteTraceback(在此过程中还有一个,这里只有一个),另外一个问题是回溯是随机的吗?在此玩具示例中,通常为ValueError:1,但有时还会出现其他数字.多处理程序是否保留了崩溃的第一个进程的第一个回溯?

When the code terminated it presented me with one (and one only as here, while the processes were many more) multiprocessing.pool.RemoteTraceback - bonus question is this traceback random? In this toy example, it is usually ValueError: 1 but sometimes also other numbers appear. Does multiprocessing keep the first traceback from the first process that crashes?

工作进程从共享队列中获取任务.从队列中读取是顺序的,因此任务1将始终在任务2之前读取.尽管如此,在工作人员中准备好结果的顺序是不可预测的.有很多与硬件和操作系统相关的因素在起作用,所以是的,回溯是随机的,因为结果的顺序是随机的,因为(字符串化的)回溯是将结果发送回父级的一部分.结果还通过共享队列发送回,并且Pool在内部处理返回的任务JIT.万一任务返回失败,则将整个作业标记为不成功,并丢弃进一步到达的任务.一旦作业中的所有任务都返回,只有第一个检索到的异常在父级中重新引发.

The worker processes get tasks from a shared queue. Reading from the queue is sequential, so task 1 will always be read before task 2. It's not predictable in which order the results will be ready in the workers, though. There are a lot of hardware and OS-dependent factors into play, so yes, the traceback is random as the order of results is random, since the (stringified) traceback is part of the result being send back to the parent. The results are also send back over a shared queue and Pool internally handles returning tasks JIT. In case a task returns unsuccessfully, the whole job is marked as not successful and further arriving tasks are discarded. Only the first retrieved exception gets reraised in the parent as soon all tasks within the job have returned.

这篇关于Python多处理-池中的进程数是否会因错误而减少?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆