Python Pandas-使用to_sql逐块写入大型数据帧 [英] Python Pandas - Using to_sql to write large data frames in chunks
问题描述
我正在使用Pandas的to_sql
函数写入MySQL,这是由于较大的帧大小(1M行,20列)而导致的超时.
I'm using Pandas' to_sql
function to write to MySQL, which is timing out due to large frame size (1M rows, 20 columns).
http://pandas.pydata.org/pandas-docs/stable /generation/pandas.DataFrame.to_sql.html
是否存在一种更正式的方式来对数据进行分块并在块中写入行?我已经编写了自己的代码,这似乎行得通.我希望有一个官方的解决方案.谢谢!
Is there a more official way to chunk through the data and write rows in blocks? I've written my own code, which seems to work. I'd prefer an official solution though. Thanks!
def write_to_db(engine, frame, table_name, chunk_size):
start_index = 0
end_index = chunk_size if chunk_size < len(frame) else len(frame)
frame = frame.where(pd.notnull(frame), None)
if_exists_param = 'replace'
while start_index != end_index:
print "Writing rows %s through %s" % (start_index, end_index)
frame.iloc[start_index:end_index, :].to_sql(con=engine, name=table_name, if_exists=if_exists_param)
if_exists_param = 'append'
start_index = min(start_index + chunk_size, len(frame))
end_index = min(end_index + chunk_size, len(frame))
engine = sqlalchemy.create_engine('mysql://...') #database details omited
write_to_db(engine, frame, 'retail_pendingcustomers', 20000)
推荐答案
更新:由于@artemyk,此功能已在pandas master中合并,并将在0.15(可能是9月底)发布.参见 https://github.com/pydata/pandas/pull/8062
Update: this functionality has been merged in pandas master and will be released in 0.15 (probably end of september), thanks to @artemyk! See https://github.com/pydata/pandas/pull/8062
因此,从0.15开始,您可以指定chunksize
参数,例如只需:
So starting from 0.15, you can specify the chunksize
argument and e.g. simply do:
df.to_sql('table', engine, chunksize=20000)
这篇关于Python Pandas-使用to_sql逐块写入大型数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!