如何找到使用python多处理运行的理想并行进程数? [英] How to find ideal number of parallel processes to run with python multiprocessing?

查看:119
本文介绍了如何找到使用python多处理运行的理想并行进程数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尝试找出使用 python 多处理 运行的正确并行进程数一>.

Trying to find out the correct number of parallel processes to run with python multiprocessing.

以下脚本在 8 核、32 GB (Ubuntu 18.04) 机器上运行.(以下测试时只有系统进程和基本用户进程在运行.)

Scripts below are run on an 8-core, 32 GB (Ubuntu 18.04) machine. (There were only system processes and basic user processes running while the below was tested.)

使用以下内容测试了 multiprocessing.Poolapply_async:

Tested multiprocessing.Pool and apply_async with the following:

from multiprocessing import current_process, Pool, cpu_count
from datetime import datetime
import time

num_processes = 1 # vary this

print(f"Starting at {datetime.now()}")
start = time.perf_counter()

print(f"# CPUs = {cpu_count()}") # 8
num_procs = 5 * cpu_count() # 40


def cpu_heavy_fn():
    s = time.perf_counter()
    print(f"{datetime.now()}: {current_process().name}")
    x = 1
    for i in range(1, int(1e7)):
        x = x * i
        x = x / i
    t_taken = round(time.perf_counter() - s, 2)
    return t_taken, current_process().name


pool = Pool(processes=num_processes)

multiple_results = [pool.apply_async(cpu_heavy_fn, ()) for i in range(num_procs)]
results = [res.get() for res in multiple_results]
for r in results:
    print(r[0], r[1])

print(f"Done at {datetime.now()}")
print(f"Time taken = {time.perf_counter() - start}s")

结果如下:

num_processes total_time_taken
1 28.25
2 14.28
3 10.2
4 7.35
5 7.89
6 8.03
7 8.41
8 8.72
9 8.75
16 8.7
40 9.53

以下对我来说很有意义:

The following make sense to me:

  • 每次运行一个进程大约需要 0.7 秒,因此运行 40 需要大约 28 秒,这与我们上面观察到的一致.
  • 一次运行 2 个进程应该可以将时间减半,如上所示(约 14 秒).
  • 一次运行 4 个进程应该会进一步将时间减半,如上所示(约 7 秒).
  • 将并行性增加到超过内核数 (8) 会降低性能(由于 CPU 争用),并且可以观察到(某种程度上).

没有意义的是:

  • 为什么并行运行 8 的速度不是并行运行 4 的两倍,即为什么不是 ~3.5 秒?
  • 为什么一次并行运行 5 到 8 个比一次运行 4 个更糟糕?有8个核心,但为什么整体运行时间更糟?(并行运行 8 个时,htop 显示所有 CPU 的利用率接近 100%.并行运行 4 个时,只有 4 个 CPU 处于 100%,这是合理的.)
  • Why does running 8 in parallel not twice as fast as running 4 in parallel i.e. why is it not ~3.5s?
  • Why is running 5 to 8 in parallel at a time worse than running 4 at a time? There are 8 cores, but still why is the overall run time worse? (When running 8 in parallel, htop showed all CPUs at near 100% utilization. When running 4 in parallel, only 4 of them were at 100% which makes sense.)

推荐答案

最可能的原因是您在使用 同步多线程 (SMT),更为人所知的是 hyper- 英特尔单元上的线程.引用维基之后,对于物理上存在的每个处理器内核,操作系统处理两个虚拟(逻辑)内核并在可能的情况下在它们之间共享工作负载.这就是这里发生的事情.

Most likely cause is that you are running the program on a CPU that uses simultaneous multithreading (SMT), better known as hyper-threading on Intel units. To cite after wiki, for each processor core that is physically present, the operating system addresses two virtual (logical) cores and shares the workload between them when possible. That's what's happening here.

您的操作系统说是 8 核,但实际上它是带有 SMT 的 4 核.该任务显然受 CPU 限制,因此任何超出物理内核数量的增加都不会带来任何好处,只会带来多处理的开销成本.这就是为什么您会看到性能几乎呈线性增长,直到达到(物理!)最大值.核心数 (4),然后在需要为这项 CPU 密集型任务共享核心时减少.

Your OS says 8 cores, but in truth it's 4 cores with SMT. The task is clearly CPU-bound, so any increase beyond physical number of cores does not bring any benefit, only overhead cost of multiprocessing. That's why you see almost linear increase in performance until you reach (physical!) max. number of cores (4) and then decrease when the cores needs be shared for this very CPU-intensive task.

这篇关于如何找到使用python多处理运行的理想并行进程数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆