OSError:[Errno 12]使用python多处理池时无法分配内存 [英] OSError: [Errno 12] Cannot allocate memory when using python multiprocessing Pool

查看:573
本文介绍了OSError:[Errno 12]使用python多处理池时无法分配内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Python的multiprocessing将一个函数并行应用于5个交叉验证集,并针对不同的参数值重复该操作,如下所示:

I am trying to apply a function to 5 cross validation sets in parallel using Python's multiprocessing and repeat that for different parameter values, like so:

import pandas as pd
import numpy as np
import multiprocessing as mp
from sklearn.model_selection import StratifiedKFold

#simulated datasets
X = pd.DataFrame(np.random.randint(2, size=(3348,868), dtype='int8'))
y = pd.Series(np.random.randint(2, size=3348, dtype='int64'))

#dummy function to apply
def _work(args):
    del(args)

for C in np.arange(0.0,2.0e-3,1.0e-6):
    splitter = StratifiedKFold(n_splits=5)
    with mp.Pool(processes=5) as pool:
        pool_results = \
            pool.map(
                func=_work,
                iterable=((C,X.iloc[train_index],X.iloc[test_index]) for train_index, test_index in splitter.split(X, y))
            )

但是在执行过程中,出现以下错误:

However halfway through execution I get the following error:

Traceback (most recent call last):
  File "mre.py", line 19, in <module>
    with mp.Pool(processes=5) as pool:
  File "/usr/lib/python3.5/multiprocessing/context.py", line 118, in Pool
    context=self.get_context())
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 168, in __init__
    self._repopulate_pool()
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 233, in _repopulate_pool
    w.start()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/usr/lib/python3.5/multiprocessing/context.py", line 267, in _Popen
    return Popen(process_obj)
  File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 67, in _launch
    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

我正在具有32Gb内存的Ubuntu 16.04上运行此程序,并在执行过程中检查htop它是否不会超过18.5Gb,所以我不认为我的内存不足.
这肯定是由于使用splitter.split(X,y)的索引分割了数据帧,因为当我直接将数据帧传递给Pool对象时,不会引发任何错误.

I'm running this on Ubuntu 16.04 with 32Gb of memory, and checking htop during execution it never goes over 18.5Gb, so I don't think I'm running out of memory.
It is definitly due to the splitting of my dataframes with the indexes from splitter.split(X,y) since when I directly pass my dataframes to the Pool object no error is thrown.

我看到了这个答案,它说可能是由于创建了太多文件依赖项,但我不知道我该如何解决这个问题,上下文管理器不是应该帮助避免这种问题吗?

I saw this answer that says it might be due to too many file dependencies being created, but I have no idea how I might go about fixing that, and isn't the context manager supposed to help avoid this sort of problem?

推荐答案

os.fork()复制一个进程,因此,如果您的使用量约为18 GB,并且想调用fork,则需要另外18 GB.两次18是36 GB,远远超过32 GB.尽管这种分析是(天真的)天真的(有些事情不会在fork上复制),但足以解释这个问题.

os.fork() makes a copy of a process, so if you're sitting at about 18 GB of usage, and want to call fork, you need another 18 GB. Twice 18 is 36 GB, which is well over 32 GB. While this analysis is (intentionally) naive—some things don't get copied on fork—it's probably sufficient to explain the problem.

解决方案是在需要复制较少内存的情况下使池更早,或者在共享最大对象时更加努力.或者,当然,向系统添加更多内存(也许只是虚拟内存,即交换空间).

The solution is either to make the pools earlier, when less memory needs to be copied, or to work harder at sharing the largest objects. Or, of course, add more memory (perhaps just virtual memory, i.e., swap space) to the system.

这篇关于OSError:[Errno 12]使用python多处理池时无法分配内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆