这是内存泄漏吗(python中带有sqlalchemy/sqlite的程序) [英] Is this a memory leak ( a program in python with sqlalchemy/sqlite)

查看:97
本文介绍了这是内存泄漏吗(python中带有sqlalchemy/sqlite的程序)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下代码在大量数据(2M)上运行.在完成之前,它吞噬了我所有的4G内存.

    for sample in session.query(CodeSample).yield_per(100):
            for proj in projects:
                    if sample.filename.startswith(proj.abs_source):
                            sample.filename = "some other path"
                            session.add(sample)

然后,我通过简化的数据集运行它并分析了堆. get_rp()给了我以下提示

0: _ --- [-] 47821 (0x9163aec | 0x9165fec | 0x916d6cc | 0x9251414 | 0x925704...
 1: a      [-] 8244 tuple: 0x903ec8c*37, 0x903fcfc*13, 0x9052ecc*46...
 2: aa ---- [S] 3446 types.CodeType: parseresult.py:73:src_path...
 3: ab      [S] 364 type: __builtin__.Struct, _random.Random, sqlite3.Cache...
 4: ac ---- [-] 90 sqlalchemy.sql.visitors.VisitableType: 0x9162f2c...
 5: aca      [S] 11 dict of module: ..sql..., codemodel, sqlalchemy
 6: acb ---- [-] 48 sqlalchemy.sql.visitors.VisitableType: 0x9162f2c...
 7: acba      [S] 9 dict of module: ..sql..., codemodel, sqlalchemy
 8: acbb ---- [-] 45 sqlalchemy.sql.visitors.VisitableType: 0x9165fec...
 9: acbba      [S] 8 dict of module: ..sql..., codemodel, sqlalchemy

我是sqlalchemy的新手.这是内存泄漏吗?谢谢.

解决方案

大多数DBAPI,包括psycopg2和mysql-python,都将所有结果完全加载到内存中,然后再将其释放给客户端. SQLA的yield_per()选项无法解决此问题,下面有一个例外,这就是为什么它通常不是一个非常有用的选项的原因(在完全提取实际行之前开始传输结果的意义上很有用).

此行为的例外是:

  1. 使用不缓冲行的DBAPI.由于OCI的自然工作方式,因此cx_oracle是其中之一.不确定pg8000的行为,还有一个名为OurSQL的新MySQL DBAPI,创建者告诉我它不缓冲行. SQLAlchemy 0.6支持pg8000和OurSQL.
  2. 对于psycopg2,可以使用服务器端光标". SQLAlchemy支持create_engine()标志"server_side_cursors = True",该标志将服务器端游标用于所有行选择操作.但是,由于服务器端游标通常很昂贵,因此会降低较小查询的性能,因此SQLAlchemy 0.6现在使用.execution_options(stream_results = True)在每个语句或每个查询的基础上支持psycopg2的服务器端游标,其中execute_options可用于查询,select(),text()和Connection.当使用yield_per()时,Query对象将调用此选项,因此在0.6 yield_per()与psycopg2结合使用实际上是有用的.

I have the following code runs over a large set of data (2M). It eats up all my 4G mem before finishing.

    for sample in session.query(CodeSample).yield_per(100):
            for proj in projects:
                    if sample.filename.startswith(proj.abs_source):
                            sample.filename = "some other path"
                            session.add(sample)

Then I ran it though a reduced set of data and analyzed heap with heapy. get_rp() gave me the folloing hint

0: _ --- [-] 47821 (0x9163aec | 0x9165fec | 0x916d6cc | 0x9251414 | 0x925704...
 1: a      [-] 8244 tuple: 0x903ec8c*37, 0x903fcfc*13, 0x9052ecc*46...
 2: aa ---- [S] 3446 types.CodeType: parseresult.py:73:src_path...
 3: ab      [S] 364 type: __builtin__.Struct, _random.Random, sqlite3.Cache...
 4: ac ---- [-] 90 sqlalchemy.sql.visitors.VisitableType: 0x9162f2c...
 5: aca      [S] 11 dict of module: ..sql..., codemodel, sqlalchemy
 6: acb ---- [-] 48 sqlalchemy.sql.visitors.VisitableType: 0x9162f2c...
 7: acba      [S] 9 dict of module: ..sql..., codemodel, sqlalchemy
 8: acbb ---- [-] 45 sqlalchemy.sql.visitors.VisitableType: 0x9165fec...
 9: acbba      [S] 8 dict of module: ..sql..., codemodel, sqlalchemy

I'm new to sqlalchemy. Is this a memory leak? Thanks.

解决方案

Most DBAPIs, including psycopg2 and mysql-python, fully load all results into memory before releasing them to the client. SQLA's yield_per() option doesn't work around this, with one exception below, which is why its generally not a very useful option(edit: useful in the sense that it begins streaming results before the actual rows are fully fetched).

The exceptions to this behavior are:

  1. Using a DBAPI that doesn't buffer rows. cx_oracle is one, as a result of the natural way OCI works. Not sure about pg8000's behavior, and there's also a new MySQL DBAPI called OurSQL which I am told by its creator does not buffer rows. pg8000 and OurSQL are supported by SQLAlchemy 0.6.
  2. With psycopg2, a "server side cursor" may be used. SQLAlchemy supports a create_engine() flag "server_side_cursors=True" which uses server side cursors for all row-selecting operations. However, because server side cursors are generally expensive and thus will reduce performance for smaller queries, SQLAlchemy 0.6 now supports psycopg2's server side cursor on a per-statement or per-query basis using .execution_options(stream_results=True), where execution_options is available on Query, select(), text(), and Connection. The Query object calls this option when yield_per() is used, so in 0.6 yield_per() in conjunction with psycopg2 is actually useful.

这篇关于这是内存泄漏吗(python中带有sqlalchemy/sqlite的程序)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆