在共享内存中使用Multiprocessing.Array时没有剩余空间 [英] No space left while using Multiprocessing.Array in shared memory

查看:152
本文介绍了在共享内存中使用Multiprocessing.Array时没有剩余空间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Python的多处理功能在具有大约500GB RAM的计算机上并行运行我的代码.为了在不同的工作人员之间共享一些数组,我正在创建一个 Array 对象:

I am using the multiprocessing functions of Python to run my code parallel on a machine with roughly 500GB of RAM. To share some arrays between the different workers I am creating a Array object:

N = 150
ndata = 10000
sigma = 3
ddim = 3

shared_data_base = multiprocessing.Array(ctypes.c_double, ndata*N*N*ddim*sigma*sigma)
shared_data = np.ctypeslib.as_array(shared_data_base.get_obj())
shared_data = shared_data.reshape(-1, N, N, ddim*sigma*sigma)

这对于 sigma = 1 来说是完美的工作,但是对于 sigma = 3 来说,设备的硬盘驱动器之一会慢慢装满,直到不再有可用空间为止,并且那么该过程将因以下异常而失败:

This is working perfectly for sigma=1, but for sigma=3 one of the harddrives of the device is slowly filled, until there is no free space anymore and then the process fails with this exception:

OSError: [Errno 28] No space left on device

现在我有2个问题:

  1. 为什么此代码甚至将任何内容写入光盘?为什么它们都没有存储在内存中?
  2. 如何解决此问题?我可以不将Python完整地存储在RAM中而无需将其写入HDD吗?还是可以更改写入该阵列的硬盘?

编辑:我在网上找到了一些建议,表明该阵列存储在共享内存"中.但是/dev/shm 设备具有比/dev/sda1 更大的可用空间,该空间由上面的代码填充.此处是此代码的跟踪日志(相关部分).

EDIT: I found something online which suggests, that the array is stored in the "shared memory". But the /dev/shm device has plenty more free space as the /dev/sda1 which is filled up by the code above. Here is the (relevant part of the) strace log of this code.

编辑#2 :我认为我已经找到了解决此问题的方法.通过查看源代码,我发现 multiprocessing 试图在使用

Edit #2: I think that I have found a workarround for this problem. By looking at the source I found that multiprocessing tries to create a temporary file in a directory which is determinded by using

process.current_process()._config.get('tempdir')

在脚本开头手动设置此值

Setting this value manually at the beginning of the script

from multiprocessing import process
process.current_process()._config['tempdir'] =  '/data/tmp/'

似乎正在解决此问题.但是我认为这不是解决问题的最佳方法.因此:还有其他建议如何处理吗?

seems to be solving this issue. But I think that this is not the best way to solve it. So: are there any other suggestions how to handle it?

推荐答案

这些数据大于500GB. sys.getsizeof()在我的计算机上仅 shared_base_base 将是826.2GB,而 pympler.asizeof.asizeof()将是1506.6GB.即使它们只有500GB,您的计算机也需要一些内存才能运行.这就是为什么数据要进入磁盘的原因.

These data are larger than 500GB. Just shared_data_base would be 826.2GB on my machine by sys.getsizeof() and 1506.6GB by pympler.asizeof.asizeof(). Even if they were only 500GB, your machine needs some of that memory in order to run. This is why the data are going to disk.

import ctypes
from pympler.asizeof import asizeof
import sys


N = 150
ndata = 10000
sigma = 3
ddim = 3
print(sys.getsizeof(ctypes.c_double(1.0)) * ndata*N*N*ddim*sigma*sigma)
print(asizeof(ctypes.c_double(1.0)) * ndata*N*N*ddim*sigma*sigma)

请注意,在我的机器(Debian 9)上,/tmp是填充位置.如果发现必须使用磁盘,请确保所用磁盘上的位置具有足够的可用空间,通常/tmp并不是一个很大的分区.

Note that on my machine (Debian 9), /tmp is the location that fills. If you find that you must use disk, be certain that the location on disk used has enough available space, typically /tmp isn't a large partition.

这篇关于在共享内存中使用Multiprocessing.Array时没有剩余空间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆