选择语句上的 SQLAlchemy 内存占用 [英] SQLAlchemy memory hog on select statement
问题描述
根据 SQLAlchemy,select 语句在 for 循环中被视为可迭代对象.其效果是,将返回大量行的 select 语句不会使用过多的内存.
As per the SQLAlchemy, select statements are treated as iterables in for loops. The effect is that a select statement that would return a massive amount of rows does not use excessive memory.
我发现 MySQL 表中有以下语句:
I am finding that the following statement on a MySQL table:
for row in my_connections.execute(MyTable.__table__.select()):
yield row
似乎没有遵循这一点,因为我溢出了可用内存并在产生第一行之前开始抖动.我做错了什么?
Does not seem to follow this, as I overflow available memory and begin thrashing before the first row is yielded. What am I doing wrong?
推荐答案
基本的 MySQLdb
游标一次从服务器获取整个查询结果.这会消耗大量内存和时间.使用 MySQLdb.cursors.SSCursor 当您想要进行大量查询并且一次从服务器拉取一个结果.
The basic MySQLdb
cursor fetches the entire query result at once from the server.
This can consume a lot of memory and time.
Use MySQLdb.cursors.SSCursor when you want to make a huge query and
pull results from the server one at a time.
因此,尝试传递 connect_args={'cursorclass': MySQLdb.cursors.SSCursor}
创建引擎
时:
Therefore, try passing connect_args={'cursorclass': MySQLdb.cursors.SSCursor}
when creating the engine
:
from sqlalchemy import create_engine, MetaData
import MySQLdb.cursors
engine = create_engine('mysql://root:zenoss@localhost/e2', connect_args={'cursorclass': MySQLdb.cursors.SSCursor})
meta = MetaData(engine, reflect=True)
conn = engine.connect()
rs = s.execution_options(stream_results=True).execute()
参见 http://www.sqlalchemy.org/trac/ticket/1089
请注意,使用 SSCursor 会锁定表,直到提取完成.这会影响使用同一连接的其他游标:来自同一连接的两个游标不能同时从表中读取.
Note that using SSCursor locks the table until the fetch is complete. This affects other cursors using the same connection: Two cursors from the same connection can not read from the table concurrently.
但是,来自不同连接的游标可以同时从同一个表中读取.
However, cursors from different connections can read from the same table concurrently.
以下是一些演示问题的代码:
Here is some code demonstrating the problem:
import MySQLdb
import MySQLdb.cursors as cursors
import threading
import logging
import config
logger = logging.getLogger(__name__)
query = 'SELECT * FROM huge_table LIMIT 200'
def oursql_conn():
import oursql
conn = oursql.connect(
host=config.HOST, user=config.USER, passwd=config.PASS,
db=config.MYDB)
return conn
def mysqldb_conn():
conn = MySQLdb.connect(
host=config.HOST, user=config.USER,
passwd=config.PASS, db=config.MYDB,
cursorclass=cursors.SSCursor)
return conn
def two_cursors_one_conn():
"""Two SSCursors can not use one connection concurrently"""
def worker(conn):
cursor = conn.cursor()
cursor.execute(query)
for row in cursor:
logger.info(row)
conn = mysqldb_conn()
threads = [threading.Thread(target=worker, args=(conn, ))
for n in range(2)]
for t in threads:
t.daemon = True
t.start()
# Second thread may hang or raise OperationalError:
# File "/usr/lib/pymodules/python2.7/MySQLdb/cursors.py", line 289, in _fetch_row
# return self._result.fetch_row(size, self._fetch_type)
# OperationalError: (2013, 'Lost connection to MySQL server during query')
for t in threads:
t.join()
def two_cursors_two_conn():
"""Two SSCursors from independent connections can use the same table concurrently"""
def worker():
conn = mysqldb_conn()
cursor = conn.cursor()
cursor.execute(query)
for row in cursor:
logger.info(row)
threads = [threading.Thread(target=worker) for n in range(2)]
for t in threads:
t.daemon = True
t.start()
for t in threads:
t.join()
logging.basicConfig(level=logging.DEBUG,
format='[%(asctime)s %(threadName)s] %(message)s',
datefmt='%H:%M:%S')
two_cursors_one_conn()
two_cursors_two_conn()
请注意,oursql 是 Python 的一组替代 MySQL 绑定.oursql 游标是真正的服务器端游标,它默认懒惰地获取行.安装oursql
,如果你改变
Note that oursql is an alternative set of MySQL bindings for Python. oursql cursors are true server-side cursors which fetch rows lazily by default. With oursql
installed, if you change
conn = mysqldb_conn()
到
conn = oursql_conn()
然后 two_cursors_one_conn()
运行时不会挂起或引发异常.
then two_cursors_one_conn()
runs without hanging or raising an exception.
这篇关于选择语句上的 SQLAlchemy 内存占用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!