IPython.parallel模块中的内存泄漏? [英] Memory leak in IPython.parallel module?

查看:146
本文介绍了IPython.parallel模块中的内存泄漏?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用IPython.parallel来处理群集上的大量数据。我运行的远程函数看起来像:

I'm using IPython.parallel to process a large amount of data on a cluster. The remote function I run looks like:

def evalPoint(point, theta):
    # do some complex calculation
    return (cost, grad)

由此函数调用:

def eval(theta, client, lview, data):
    async_results = []
    for point in data:
        # evaluate current data point
        ar = lview.apply_async(evalPoint, point, theta)
        async_results.append(ar)

    # wait for all results to come back
    client.wait(async_results)

    # and retrieve their values
    values = [ar.get() for ar in async_results]

    # unzip data from original tuple
    totalCost, totalGrad = zip(*values)

    avgGrad =  np.mean(totalGrad, axis=0)
    avgCost = np.mean(totalCost, axis=0)

    return (avgCost, avgGrad)

如果我运行代码:

client = Client(profile="ssh")
client[:].execute("import numpy as np")        

lview = client.load_balanced_view()

for i in xrange(100):
    eval(theta, client, lview, data)

内存使用量不断增长,直到我最终耗尽(76GB内存)。我已经简化 evalPoint 什么都不做,以确保它不是罪魁祸首。

the memory usage keeps growing until I eventually run out (76GB of memory). I've simplified evalPoint to do nothing in order to make sure it wasn't the culprit.

从IPython的文档中复制了 eval 的第一部分,介绍了如何使用负载均衡器。第二部分(解压缩和平均)是相当简单的,所以我认为这不会导致内存泄漏。此外,我尝试手动删除 eval 中的对象并调用 gc.collect(),但没有运气。

The first part of eval was copied from IPython's documentation on how to use the load balancer. The second part (unzipping and averaging) is fairly straight-forward, so I don't think that's responsible for the memory leak. Additionally, I've tried manually deleting objects in eval and calling gc.collect() with no luck.

我希望有IPython.parallel经验的人可以指出一些明显我做错的事情,或者能够确认这实际上是内存泄漏。

I was hoping someone with IPython.parallel experience could point out something obvious I'm doing wrong, or would be able to confirm this in fact a memory leak.

其他一些事实:


  • 我在Ubuntu 11.10上使用Python 2.7.2

  • 我正在使用IPython版本0.12

  • 我在服务器1-3上运行引擎,在服务器1上运行客户端和集线器。我得到类似的结果如果我把所有东西都保存在服务器1上。

  • 我发现类似于IPython的内存泄漏的唯一一件事与%run ,我相信在这个版本的IPython中修复了(另外,我没有使用%run

  • I'm using Python 2.7.2 on Ubuntu 11.10
  • I'm using IPython version 0.12
  • I have engines running on servers 1-3, and the client and hub running on server 1. I get similar results if I keep everything on just server 1.
  • The only thing I've found similar to a memory leak for IPython had to do with %run, which I believe was fixed in this version of IPython (also, I am not using %run)

更新

此外,我尝试将记录从内存切换到SQLiteDB,如果是问题,但仍然是同样的问题。

Also, I tried switching logging from memory to SQLiteDB, in case that was the problem, but still have the same problem.

响应(1)

内存消耗是肯定在控制器中(我可以通过以下方式验证:(a)在另一台机器上运行客户端,以及(b)观察顶部)。我没有意识到非SQLiteDB仍会消耗内存,所以我没有打扰清除。

The memory consumption is definitely in the controller (I could verify this by: (a) running the client on another machine, and (b) watching top). I hadn't realized that non SQLiteDB would still consume memory, so I hadn't bothered purging.

如果我使用DictDB并清除,我仍然看到内存消耗量去了起来,但速度要慢得多。对于eval()的20次调用,它徘徊在2GB左右。

If I use DictDB and purge, I still see the memory consumption go up, but at a much slower rate. It was hovering around 2GB for 20 invocations of eval().

如果我使用MongoDB并清除,看起来mongod需要大约4.5GB的内存和ipcluster大约2.5 GB。

If I use MongoDB and purge, it looks like mongod is taking around 4.5GB of memory and ipcluster about 2.5GB.

如果我使用SQLite并尝试清除,我会收到以下错误:

If I use SQLite and try to purge, I get the following error:

File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/controller/hub.py", line 1076, in purge_results
  self.db.drop_matching_records(dict(completed={'$ne':None}))
File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/controller/sqlitedb.py", line 359, in drop_matching_records
  expr,args = self._render_expression(check)
File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/controller/sqlitedb.py", line 296, in _render_expression
  expr = "%s %s"%null_operators[op]
TypeError: not enough arguments for format string

所以,我想如果我使用DictDB,我可能会好的(我今晚要试试)。我不确定是否仍然需要一些内存消耗(我也像你建议的那样在客户端中清除)。

So, I think if I use DictDB, I might be okay (I'm going to try a run tonight). I'm not sure if some memory consumption is still expected or not (I also purge in the client like you suggested).

推荐答案

是控制器进程正在增长,还是客户端,或两者兼而有?

Is it the controller process that is growing, or the client, or both?

控制器会记住所有请求和所有结果,因此存储此信息的默认行为一个简单的词典将导致不断增长。使用db后端(sqlite或者mongodb,如果可用)应该解决这个问题,或者 client.purge_results()方法可以用来指示控制器丢弃任何/全部结果历史记录(如果您使用的话,这将从数据库中删除它们。)

The controller remembers all requests and all results, so the default behavior of storing this information in a simple dict will result in constant growth. Using a db backend (sqlite or preferably mongodb if available) should address this, or the client.purge_results() method can be used to instruct the controller to discard any/all of the result history (this will delete them from the db if you are using one).

客户端本身将其所有结果缓存在其中结果 dict,所以这也会随着时间的推移而增长。不幸的是,这个有点难以处理,因为引用可以在各种方向传播,并且不受控制器的db后端的影响。

The client itself caches all of its own results in its results dict, so this, too, will result in growth over time. Unfortunately, this one is a bit harder to get a handle on, because references can propagate in all sorts of directions, and is not affected by the controller's db backend.

这个是IPython中的已知问题,但就目前而言,您应该能够通过删除条目来手动清除引用在客户的结果/元数据中,如果你的观点一直存在,它有自己的结果dict:

This is a known issue in IPython, but for now, you should be able to clear the references manually by deleting the entries in the client's results/metadata dicts and if your view is sticking around, it has its own results dict:

# ...
# and retrieve their values
values = [ar.get() for ar in async_results]

# clear references to the local cache of results:
for ar in async_results:
    for msg_id in ar.msg_ids:
        del lview.results[msg_id]
        del client.results[msg_id]
        del client.metadata[msg_id]

或者,您可以使用简单的清除整个客户端缓存dict.clear()

Or, you can purge the entire client-side cache with simple dict.clear():

view.results.clear()
client.results.clear()
client.metadata.clear()

附注:

视图有自己的wait()方法,因此您根本不需要将客户端传递给您的函数。一切都应该可以通过View访问,如果你真的需要客户端(例如用于清除缓存),你可以将它作为 view.client

Views have their own wait() method, so you shouldn't need to pass the Client to your function at all. Everything should be accessible via the View, and if you really need the client (e.g. for purging the cache), you can get it as view.client.

这篇关于IPython.parallel模块中的内存泄漏?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆