Python多处理-调试OSError:[Errno 12]无法分配内存 [英] Python multiprocessing - Debugging OSError: [Errno 12] Cannot allocate memory

查看:180
本文介绍了Python多处理-调试OSError:[Errno 12]无法分配内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正面临以下问题.我正在尝试并行化更新文件的功能,但是由于OSError: [Errno 12] Cannot allocate memory我无法启动Pool().我已经开始在服务器上四处张望,这不像是我在使用旧的,较弱的内存/实际内存不足. 参见htop: 另外,free -m表示除了约7GB的交换内存外,我还有大量可用的RAM: 我尝试使用的文件也不大.我将在下面粘贴我的代码(和堆栈跟踪),大小如下:

I'm facing the following issue. I'm trying to parallelize a function that updates a file, but I cannot start the Pool() because of an OSError: [Errno 12] Cannot allocate memory. I've started looking around on the server, and it's not like I'm using an old, weak one/out of actual memory. See htop: Also, free -m shows I have plenty of RAM available in addition to the ~7GB of swap memory: And the files I'm trying to work with aren't that big either. I'll paste my code (and the stack trace) below, there, the sizes are as follows:

所使用的predictionmatrix数据帧约占根据pandasdataframe.memory_usage()的80MB 文件geo.geojson是2MB

The predictionmatrix dataframe used takes up ca. 80MB according to pandasdataframe.memory_usage() The file geo.geojson is 2MB

如何进行调试?我可以检查什么以及如何检查?谢谢您的提示/技巧!

How do I go about debugging this? What can I check and how? Thank you for any tips/tricks!

代码:

def parallelUpdateJSON(paramMatch, predictionmatrix, data):
    for feature in data['features']: 
        currentfeature = predictionmatrix[(predictionmatrix['SId']==feature['properties']['cellId']) & paramMatch]
        if (len(currentfeature) > 0):
            feature['properties'].update({"style": {"opacity": currentfeature.AllActivity.item()}})
        else:
            feature['properties'].update({"style": {"opacity": 0}})

def writeGeoJSON(weekdaytopredict, hourtopredict, predictionmatrix):
    with open('geo.geojson') as f:
        data = json.load(f)
    paramMatch = (predictionmatrix['Hour']==hourtopredict) & (predictionmatrix['Weekday']==weekdaytopredict)
    pool = Pool()
    func = partial(parallelUpdateJSON, paramMatch, predictionmatrix)
    pool.map(func, data)
    pool.close()
    pool.join()

    with open('output.geojson', 'w') as outfile:
        json.dump(data, outfile)

堆栈跟踪:

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-428-d6121ed2750b> in <module>()
----> 1 writeGeoJSON(6, 15, baseline)

<ipython-input-427-973b7a5a8acc> in writeGeoJSON(weekdaytopredict, hourtopredict, predictionmatrix)
     14     print("Start loop")
     15     paramMatch = (predictionmatrix['Hour']==hourtopredict) & (predictionmatrix['Weekday']==weekdaytopredict)
---> 16     pool = Pool(2)
     17     func = partial(parallelUpdateJSON, paramMatch, predictionmatrix)
     18     print(predictionmatrix.memory_usage())

/usr/lib/python3.5/multiprocessing/context.py in Pool(self, processes, initializer, initargs, maxtasksperchild)
    116         from .pool import Pool
    117         return Pool(processes, initializer, initargs, maxtasksperchild,
--> 118                     context=self.get_context())
    119 
    120     def RawValue(self, typecode_or_type, *args):

/usr/lib/python3.5/multiprocessing/pool.py in __init__(self, processes, initializer, initargs, maxtasksperchild, context)
    166         self._processes = processes
    167         self._pool = []
--> 168         self._repopulate_pool()
    169 
    170         self._worker_handler = threading.Thread(

/usr/lib/python3.5/multiprocessing/pool.py in _repopulate_pool(self)
    231             w.name = w.name.replace('Process', 'PoolWorker')
    232             w.daemon = True
--> 233             w.start()
    234             util.debug('added worker')
    235 

/usr/lib/python3.5/multiprocessing/process.py in start(self)
    103                'daemonic processes are not allowed to have children'
    104         _cleanup()
--> 105         self._popen = self._Popen(self)
    106         self._sentinel = self._popen.sentinel
    107         _children.add(self)

/usr/lib/python3.5/multiprocessing/context.py in _Popen(process_obj)
    265         def _Popen(process_obj):
    266             from .popen_fork import Popen
--> 267             return Popen(process_obj)
    268 
    269     class SpawnProcess(process.BaseProcess):

/usr/lib/python3.5/multiprocessing/popen_fork.py in __init__(self, process_obj)
     18         sys.stderr.flush()
     19         self.returncode = None
---> 20         self._launch(process_obj)
     21 
     22     def duplicate_for_child(self, fd):

/usr/lib/python3.5/multiprocessing/popen_fork.py in _launch(self, process_obj)
     65         code = 1
     66         parent_r, child_w = os.pipe()
---> 67         self.pid = os.fork()
     68         if self.pid == 0:
     69             try:

OSError: [Errno 12] Cannot allocate memory

更新

根据@robyschek的解决方案,我已将代码更新为:

According to @robyschek's solution, I've updated my code to:

global g_predictionmatrix 

def worker_init(predictionmatrix):
    global g_predictionmatrix
    g_predictionmatrix = predictionmatrix    

def parallelUpdateJSON(paramMatch, data_item):
    for feature in data_item['features']: 
        currentfeature = predictionmatrix[(predictionmatrix['SId']==feature['properties']['cellId']) & paramMatch]
        if (len(currentfeature) > 0):
            feature['properties'].update({"style": {"opacity": currentfeature.AllActivity.item()}})
        else:
            feature['properties'].update({"style": {"opacity": 0}})

def use_the_pool(data, paramMatch, predictionmatrix):
    pool = Pool(initializer=worker_init, initargs=(predictionmatrix,))
    func = partial(parallelUpdateJSON, paramMatch)
    pool.map(func, data)
    pool.close()
    pool.join()


def writeGeoJSON(weekdaytopredict, hourtopredict, predictionmatrix):
    with open('geo.geojson') as f:
        data = json.load(f)
    paramMatch = (predictionmatrix['Hour']==hourtopredict) & (predictionmatrix['Weekday']==weekdaytopredict)
    use_the_pool(data, paramMatch, predictionmatrix)     
    with open('trentino-grid.geojson', 'w') as outfile:
        json.dump(data, outfile)

我仍然遇到相同的错误.另外,根据文档map()应该将我的data分成多个块,因此我不认为它应该复制80MB的行数.我可能是错的... :) 另外,我注意到,如果我使用较小的输入(〜11MB而不是80MB),则不会收到该错误.因此,我想我正在尝试使用过多的内存,但是我无法想象它如何从80MB变为16GB的RAM无法处理.

And I still get the same error. Also, according to the documentation, map() should divide my data into chunks, so I don't think it should replicate my 80MBs rownum times. I may be wrong though... :) Plus I've noticed that if I use smaller input (~11MB instead of 80MB) I don't get the error. So I guess I'm trying to use too much memory, but I can't imagine how it goes from 80MB to something 16GBs of RAM can't handle.

推荐答案

我们有几次.根据我的系统管理员的说法,unix中有一个错误",如果内存不足,进程达到最大文件描述符限制,也会引发相同的错误.

We had this a couple of time. According to my sys admin, there is "a bug" in unix, which will raise the same error if you are out of memory, of if your process reach the max file descriptor limit.

我们的文件描述符泄漏了,错误发生的原因是[Errno 12]无法分配内存#012OSError.

We had a leak of file descriptor, and the error raising was [Errno 12] Cannot allocate memory#012OSError.

因此,您应该查看脚本并仔细检查问题是否不是创建过多的FD

So you should look at your script and double check if the problem is not the creation of too many FD instead

这篇关于Python多处理-调试OSError:[Errno 12]无法分配内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆