OSError:[Errno 12]使用python多处理池时无法分配内存 [英] OSError: [Errno 12] Cannot allocate memory when using python multiprocessing Pool

查看：573 发布时间：2020/5/13 19:59:26 python scikit-learn multiprocessing

本文介绍了OSError:[Errno 12]使用python多处理池时无法分配内存的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用Python的multiprocessing将一个函数并行应用于5个交叉验证集，并针对不同的参数值重复该操作，如下所示:

I am trying to apply a function to 5 cross validation sets in parallel using Python's multiprocessing and repeat that for different parameter values, like so:

import pandas as pd
import numpy as np
import multiprocessing as mp
from sklearn.model_selection import StratifiedKFold

#simulated datasets
X = pd.DataFrame(np.random.randint(2, size=(3348,868), dtype='int8'))
y = pd.Series(np.random.randint(2, size=3348, dtype='int64'))

#dummy function to apply
def _work(args):
    del(args)

for C in np.arange(0.0,2.0e-3,1.0e-6):
    splitter = StratifiedKFold(n_splits=5)
    with mp.Pool(processes=5) as pool:
        pool_results = \
            pool.map(
                func=_work,
                iterable=((C,X.iloc[train_index],X.iloc[test_index]) for train_index, test_index in splitter.split(X, y))
            )

但是在执行过程中，出现以下错误:

However halfway through execution I get the following error:

Traceback (most recent call last):
  File "mre.py", line 19, in <module>
    with mp.Pool(processes=5) as pool:
  File "/usr/lib/python3.5/multiprocessing/context.py", line 118, in Pool
    context=self.get_context())
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 168, in __init__
    self._repopulate_pool()
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 233, in _repopulate_pool
    w.start()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/usr/lib/python3.5/multiprocessing/context.py", line 267, in _Popen
    return Popen(process_obj)
  File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 67, in _launch
    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

我正在具有32Gb内存的Ubuntu 16.04上运行此程序，并在执行过程中检查htop它是否不会超过18.5Gb，所以我不认为我的内存不足.
这肯定是由于使用splitter.split(X,y)的索引分割了数据帧，因为当我直接将数据帧传递给Pool对象时，不会引发任何错误.

I'm running this on Ubuntu 16.04 with 32Gb of memory, and checking htop during execution it never goes over 18.5Gb, so I don't think I'm running out of memory.
It is definitly due to the splitting of my dataframes with the indexes from splitter.split(X,y) since when I directly pass my dataframes to the Pool object no error is thrown.

我看到了这个答案，它说可能是由于创建了太多文件依赖项，但我不知道我该如何解决这个问题，上下文管理器不是应该帮助避免这种问题吗?

I saw this answer that says it might be due to too many file dependencies being created, but I have no idea how I might go about fixing that, and isn't the context manager supposed to help avoid this sort of problem?

OSError:[Errno 12]使用python多处理池时无法分配内存 [英] OSError: [Errno 12] Cannot allocate memory when using python multiprocessing Pool

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

OSError:[Errno 12]使用python多处理池时无法分配内存 [英] OSError: [Errno 12] Cannot allocate memory when using python multiprocessing Pool

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭