多处理池-内存使用率 [英] multiprocessing pool - memory usage

查看:83
本文介绍了多处理池-内存使用率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我编写了一个脚本,该脚本部署在具有112个内核的HPC节点中,因此启动了112个进程,最多完成了400个所需的进程(node_combinations是400个元组的列表).相关的代码段如下:

# Parallel Path Probability Calculation
# =====================================
node_combinations = [(i, j) for i in g.nodes for j in g.nodes]
pool = Pool()
start = datetime.datetime.now()
logging.info("Start time: %s", start)
print("Start time: ", start)
pool.starmap(g._print_probability_path_ij, node_combinations)
end = datetime.datetime.now()
print("End time: ", end)
print("Run time: ", end - start)
logging.info("End time: %s", end)
logging.info("Total run time: %s", start)
pool.close()
pool.join()

我通过运行htop来跟踪性能,并观察到以下内容.最初,所有112个内核都以100%的速度工作.最终,由于某些进程比其他进程短,因此我剩下的内核数量较少,工作率达100%.最终,所有进程都显示为睡眠状态.

我相信问题是这些过程中的某些过程(花费更长的时间,约占400个过程中的20个)需要大量内存.当内存不足时,进程将进入睡眠状态,并且由于从不释放内存,因此它们将保持睡眠状态.这些是我的问题:

  1. 进程完成后,是否释放了资源(读存储器)或在所有进程完成之前它们都被占用?换句话说,一旦我只有20个内核在工作(因为其他内核已经处理了所有较短的进程),它们是否有权访问所有内存或仅剩下的内核不使用?

  2. 我已经阅读到maxtasksperchild在这种情况下可能会有所帮助.那将如何工作?如何确定每个孩子合适的任务数量?

如果您想知道为什么要问这个问题,是因为在文档中我读到了以下内容: 2.7版中的新增功能:maxtasksperchild是工作进程退出之前可以完成的任务数量,可以替换为新工作进程,以释放未使用的资源.默认的maxtasksperchild为None,这意味着工作进程将与池一样长.

解决方案

您应该至少使一个内核对核心OS可用,而对启动脚本则保留一个内核;尝试减小池的大小.例如泳池(110)

使用Pool.imap(或imap_unordered)代替Pool.map.与开始处理之前将所有数据加载到内存中相比,这将使数据延迟访问.

为maxtasksperchild参数设置一个值.

使用多处理池时,将使用fork()系统调用创建子进程.这些进程中的每一个都以当时父进程的内存副本开始.因为在创建池之前要加载元组列表,所以池中的进程将具有数据的副本.

此处的答案中介绍了内存配置文件的一种方法,以便您可以看到内存的运行时间,时间. /p>

I wrote a script that I deploy in an HPC node with 112 cores, thus starting 112 processes up to completing 400 needed (node_combinations is a list of 400 tuples). The relevant snippet of code is below:

# Parallel Path Probability Calculation
# =====================================
node_combinations = [(i, j) for i in g.nodes for j in g.nodes]
pool = Pool()
start = datetime.datetime.now()
logging.info("Start time: %s", start)
print("Start time: ", start)
pool.starmap(g._print_probability_path_ij, node_combinations)
end = datetime.datetime.now()
print("End time: ", end)
print("Run time: ", end - start)
logging.info("End time: %s", end)
logging.info("Total run time: %s", start)
pool.close()
pool.join()

I follow the performance by running htop and observed the following. Initially all 112 cores are working at 100%. Eventually, since some processes are shorter than others, I am left with a smaller number of cores working at 100%. Eventually, all processes are shown as sleeping.

I believe the problem is that some of these processes (the ones that take longer, about 20 out of 400) require a lot of memory. When memory runs short, the processes go to sleep and since memory is never freed, they remain there, sleeping. These are my questions:

  1. Once a process finishes, are the resources (read memory) freed or do they remain occupied until all processes finish? In other words, once I have only 20 cores working (because the others already processed all the shorter processes) do they have access do all the memory or only the not used by the rest of the processes?

  2. I've read that maxtasksperchild may help in this situation. How would that work? How can I determine what is the appropriate number of tasks for each child?

If you wonder why I am asking this, it's because in the documentation I read this: New in version 2.7: maxtasksperchild is the number of tasks a worker process can complete before it will exit and be replaced with a fresh worker process, to enable unused resources to be freed. The default maxtasksperchild is None, which means worker processes will live as long as the pool.

解决方案

You should leave at least one core available to the core OS and one available to the initiating script; try reducing your pool size. e.g. Pool(110)

Use Pool.imap (or imap_unordered) instead of Pool.map. This will iterate over data lazily than loading all of it in memory before starting processing.

Set a value to maxtasksperchild parameter.

When you use multiprocessing Pool, child processes will be created using the fork() system call. Each of those processes starts with a copy of the memory of the parent process at that time. Because you're loading the list of tuples before you create the Pool the processes in the pool will have a copy of the data.

The answer here walks through a method of memory profiling so you can see where your memory is going, when.

这篇关于多处理池-内存使用率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆