IPython.parallel 模块中的内存泄漏? [英] Memory leak in IPython.parallel module?

查看:26
本文介绍了IPython.parallel 模块中的内存泄漏?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 IPython.parallel 处理集群上的大量数据.我运行的远程函数看起来像:

I'm using IPython.parallel to process a large amount of data on a cluster. The remote function I run looks like:

def evalPoint(point, theta):
    # do some complex calculation
    return (cost, grad)

被这个函数调用:

def eval(theta, client, lview, data):
    async_results = []
    for point in data:
        # evaluate current data point
        ar = lview.apply_async(evalPoint, point, theta)
        async_results.append(ar)

    # wait for all results to come back
    client.wait(async_results)

    # and retrieve their values
    values = [ar.get() for ar in async_results]

    # unzip data from original tuple
    totalCost, totalGrad = zip(*values)

    avgGrad =  np.mean(totalGrad, axis=0)
    avgCost = np.mean(totalCost, axis=0)

    return (avgCost, avgGrad)

如果我运行代码:

client = Client(profile="ssh")
client[:].execute("import numpy as np")        

lview = client.load_balanced_view()

for i in xrange(100):
    eval(theta, client, lview, data)

内存使用量不断增长,直到我最终用完(76GB 内存).我已经简化了 evalPoint 以确保它不是罪魁祸首.

the memory usage keeps growing until I eventually run out (76GB of memory). I've simplified evalPoint to do nothing in order to make sure it wasn't the culprit.

eval 的第一部分是从 IPython 的关于如何使用负载均衡器的文档中复制的.第二部分(解压缩和平均)相当简单,所以我认为这不是内存泄漏的原因.此外,我尝试手动删除 eval 中的对象并调用 gc.collect() 没有运气.

The first part of eval was copied from IPython's documentation on how to use the load balancer. The second part (unzipping and averaging) is fairly straight-forward, so I don't think that's responsible for the memory leak. Additionally, I've tried manually deleting objects in eval and calling gc.collect() with no luck.

我希望有 IPython.parallel 经验的人可以指出我做错了什么,或者能够确认这实际上是内存泄漏.

I was hoping someone with IPython.parallel experience could point out something obvious I'm doing wrong, or would be able to confirm this in fact a memory leak.

一些额外的事实:

  • 我在 Ubuntu 11.10 上使用 Python 2.7.2
  • 我使用的是 IPython 0.12 版
  • 我的引擎在服务器 1-3 上运行,客户端和集线器在服务器 1 上运行.如果我将所有内容都保留在服务器 1 上,我会得到类似的结果.
  • 我发现与 IPython 内存泄漏类似的唯一事情与 %run 有关,我相信在此版本的 IPython 中已修复(此外,我没有使用 %run)
  • I'm using Python 2.7.2 on Ubuntu 11.10
  • I'm using IPython version 0.12
  • I have engines running on servers 1-3, and the client and hub running on server 1. I get similar results if I keep everything on just server 1.
  • The only thing I've found similar to a memory leak for IPython had to do with %run, which I believe was fixed in this version of IPython (also, I am not using %run)

更新

此外,我尝试将日志记录从内存切换到 SQLiteDB,以防万一,但仍然存在相同的问题.

Also, I tried switching logging from memory to SQLiteDB, in case that was the problem, but still have the same problem.

响应(1)

内存消耗肯定在控制器中(我可以通过以下方式验证这一点:(a)在另一台机器上运行客户端,以及(b)观察顶部).我没有意识到非 SQLiteDB 仍然会消耗内存,所以我没有费心清除.

The memory consumption is definitely in the controller (I could verify this by: (a) running the client on another machine, and (b) watching top). I hadn't realized that non SQLiteDB would still consume memory, so I hadn't bothered purging.

如果我使用 DictDB 并清除,我仍然会看到内存消耗增加,但速度要慢得多.它在 20 次 eval() 调用中徘徊在 2GB 左右.

If I use DictDB and purge, I still see the memory consumption go up, but at a much slower rate. It was hovering around 2GB for 20 invocations of eval().

如果我使用 MongoDB 并清除,看起来 mongod 需要大约 4.5GB 的内存和大约 2.5GB 的 ipcluster.

If I use MongoDB and purge, it looks like mongod is taking around 4.5GB of memory and ipcluster about 2.5GB.

如果我使用 SQLite 并尝试清除,我会收到以下错误:

If I use SQLite and try to purge, I get the following error:

File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/controller/hub.py", line 1076, in purge_results
  self.db.drop_matching_records(dict(completed={'$ne':None}))
File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/controller/sqlitedb.py", line 359, in drop_matching_records
  expr,args = self._render_expression(check)
File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/controller/sqlitedb.py", line 296, in _render_expression
  expr = "%s %s"%null_operators[op]
TypeError: not enough arguments for format string

所以,我想如果我使用 DictDB,我可能会没事(我今晚会尝试跑步).我不确定是否仍会消耗一些内存(我也按照您的建议清除了客户端).

So, I think if I use DictDB, I might be okay (I'm going to try a run tonight). I'm not sure if some memory consumption is still expected or not (I also purge in the client like you suggested).

推荐答案

增长的是控制器进程,还是客户端,或者两者兼而有之?

Is it the controller process that is growing, or the client, or both?

控制器记住所有请求和所有结果,因此将这些信息存储在一个简单的 dict 中的默认行为将导致不断增长.使用数据库后端(sqlite 或最好 mongodb,如果可用)应该解决这个问题,或者 client.purge_results() 方法可用于指示控制器丢弃任何/所有结果历史记录(这将如果您正在使用它们,请将它们从数据库中删除).

The controller remembers all requests and all results, so the default behavior of storing this information in a simple dict will result in constant growth. Using a db backend (sqlite or preferably mongodb if available) should address this, or the client.purge_results() method can be used to instruct the controller to discard any/all of the result history (this will delete them from the db if you are using one).

客户端本身将其所有结果缓存在其 results dict 中,因此这也将导致随着时间的推移而增长.不幸的是,这个有点难处理,因为引用可以向各种方向传播,并且不受控制器的数据库后端的影响.

The client itself caches all of its own results in its results dict, so this, too, will result in growth over time. Unfortunately, this one is a bit harder to get a handle on, because references can propagate in all sorts of directions, and is not affected by the controller's db backend.

这是 IPython 中的已知问题,但现在,您应该能够手动清除引用通过删除客户端结果/元数据字典中的条目,如果您的视图仍然存在,它有自己的结果字典:

This is a known issue in IPython, but for now, you should be able to clear the references manually by deleting the entries in the client's results/metadata dicts and if your view is sticking around, it has its own results dict:

# ...
# and retrieve their values
values = [ar.get() for ar in async_results]

# clear references to the local cache of results:
for ar in async_results:
    for msg_id in ar.msg_ids:
        del lview.results[msg_id]
        del client.results[msg_id]
        del client.metadata[msg_id]

或者,您可以使用简单的 dict.clear() 清除整个客户端缓存:

Or, you can purge the entire client-side cache with simple dict.clear():

view.results.clear()
client.results.clear()
client.metadata.clear()

附注:

视图有自己的 wait() 方法,因此您根本不需要将 Client 传递给您的函数.一切都应该可以通过视图访问,如果你真的需要客户端(例如清除缓存),你可以将它作为 view.client 获取.

Views have their own wait() method, so you shouldn't need to pass the Client to your function at all. Everything should be accessible via the View, and if you really need the client (e.g. for purging the cache), you can get it as view.client.

这篇关于IPython.parallel 模块中的内存泄漏?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆