使用比核心更多的工作进程 [英] Using more worker processes than there are cores

查看：44 发布时间：2021/6/4 19:51:32 python optimization parallel-processing multiprocessing python-multiprocessing

本文介绍了使用比核心更多的工作进程的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这个来自 PYMOTW 的示例给出了使用 multiprocessing.Pool() 其中传递的 processes 参数(工作进程数)是机器内核数的两倍.

This example from PYMOTW gives an example of using multiprocessing.Pool() where the processes argument (number of worker processes) passed is twice the number of cores on the machine.

pool_size = multiprocessing.cpu_count() * 2

(否则该类将默认为 cpu_count().)

(The class will otherwise default to just cpu_count().)

这有什么道理吗?创建比核心数更多的工人有什么影响?是否有理由这样做，或者它可能会在错误的方向上施加额外的开销?我很好奇为什么它会一直包含在我认为是信誉良好的网站的示例中.

Is there any validity to this? What is the effect of creating more workers than there are cores? Is there ever a case to be made for doing this, or will it perhaps impose additional overhead in the wrong direction? I am curious as to why it would be included consistently in examples from what I consider to be a reputable site.

在初始测试中，它实际上似乎有点慢:

In an initial test, it actually seems to slow things down a bit:

$ python -m timeit -n 25 -r 3 'import double_cpus; double_cpus.main()'
25 loops, best of 3: 266 msec per loop
$ python -m timeit -n 25 -r 3 'import default_cpus; default_cpus.main()'
25 loops, best of 3: 226 msec per loop

double_cpus.py:

import multiprocessing

def do_calculation(n):
    for i in range(n):
        i ** 2

def main():
    with multiprocessing.Pool(
        processes=multiprocessing.cpu_count() * 2,
        maxtasksperchild=2,
    ) as pool:
        pool.map(do_calculation, range(1000))

default_cpus.py:

def main():
    # `processes` will default to cpu_count()
    with multiprocessing.Pool(
        maxtasksperchild=2,
    ) as pool:
        pool.map(do_calculation, range(1000))

推荐答案

如果您的工作不纯粹受 cpu 限制，但也涉及一些 I/O，那么这样做是有意义的.

Doing this can make sense if your job is not purely cpu-bound, but also involves some I/O.

您示例中的计算对于合理的基准测试来说也太短了，首先创建更多进程的开销占主导地位.

The computation in your example is also too short for a reasonable benchmark, the overhead of just creating more processes in the first place dominates.

我修改了你的计算，让它在 10M 的范围内迭代，同时计算一个 if 条件并让它小睡一下，以防它评估为 True，这会发生 n_sleep-次.这样就可以将 sleep_sec_total 的总睡眠时间注入计算中.


I modified your calculation to let it iterate over a range of 10M, while calculating an if-condition and let it take a nap in case it evaluates to True, which happens n_sleep-times.
That way a total sleep of sleep_sec_total can be injected into the computation.
# default_cpus.py
import time
import multiprocessing


def do_calculation(iterations, n_sleep, sleep_sec):
    for i in range(iterations):
        if i % (iterations / n_sleep) == 0:
            time.sleep(sleep_sec)


def main(sleep_sec_total):

    iterations = int(10e6)
    n_sleep = 100
    sleep_sec = sleep_sec_total / n_sleep
    tasks = [(iterations, n_sleep, sleep_sec)] * 20

    with multiprocessing.Pool(
        maxtasksperchild=2,
    ) as pool:
        pool.starmap(do_calculation, tasks)

<小时>
# double_cpus.py
...

def main(sleep_sec_total):

    iterations = int(10e6)
    n_sleep = 100
    sleep_sec = sleep_sec_total / n_sleep
    tasks = [(iterations, n_sleep, sleep_sec)] * 20

    with multiprocessing.Pool(
        processes=multiprocessing.cpu_count() * 2,
        maxtasksperchild=2,
    ) as pool:
        pool.starmap(do_calculation, tasks)

我使用 sleep_sec_total=0(纯粹受 CPU 限制)和 sleep_sec_total=2 为两个模块运行了基准测试.
I ran the benchmark with sleep_sec_total=0 (purely cpu-bound) and with sleep_sec_total=2 for both modules.
sleep_sec_total=0 的结果:
$ python -m timeit -n 5 -r 3 'import default_cpus; default_cpus.main(0)'
5 loops, best of 3: 15.2 sec per loop

$ python -m timeit -n 5 -r 3 'import double_cpus; double_cpus.main(0)'
5 loops, best of 3: 15.2 sec per loop

给定合理的计算大小，对于纯 cpu 绑定的任务，您将观察到默认 CPU 和双 CPU 之间几乎没有区别.在这里，两个测试的最佳时间相同.
Given a reasonable computation-size, you'll observe close to no difference between default- and double-cpus for a purely cpu-bound task. Here it happened, that both tests had the same best-time.
sleep_sec_total=2 的结果:
$ python -m timeit -n 5 -r 3 'import default_cpus; default_cpus.main(2)'
5 loops, best of 3: 20.5 sec per loop
$ python -m timeit -n 5 -r 3 'import double_cpus; double_cpus.main(2)'
5 loops, best of 3: 17.7 sec per loop

现在添加 2 秒睡眠作为 I/0 的虚拟对象，图片看起来不一样了.与默认值相比，使用两倍的进程可加快约 3 秒的速度.
Now with adding 2 seconds of sleep as a dummy for I/0, the picture looks different. Using double as much processes gave a speed up of about 3 seconds compared to the default.

                        这篇关于使用比核心更多的工作进程的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

使用比核心更多的工作进程 [英] Using more worker processes than there are cores

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用比核心更多的工作进程 [英] Using more worker processes than there are cores

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭