如何使用并行插入语句在MySQL表中插入大 pandas 数据框? [英] How to Insert Huge Pandas Dataframe in MySQL table with Parallel Insert Statement?
问题描述
我正在一个项目中,我必须编写一个具有数百万行和约25列(大多数为数字类型)的数据框.我正在使用 Pandas DataFrame转换为SQL函数将数据帧转储到Mysql表中.我发现此函数创建了一个Insert语句,该语句可以一次插入多行.这是一个好方法,但是MySQL限制了使用此方法可以构建的查询的长度.
I am working on a project where I have to write a data frame with Millions of rows and about 25 columns mostly of numeric type. I am using Pandas DataFrame to SQL Function to dump the dataframe in Mysql table. I have found this function creates an Insert statement that can insert multiple rows at once. This is a good approach but MySQL has a limitation on the length of query that can be built using this approach.
有没有一种方法可以将其并行插入同一张表中,从而加快处理速度?
Is there a way such that insert that in parallel in the same table so that I can speed up the process?
推荐答案
您可以做一些事情来实现这一目标.
You can do a few things to achieve that.
一种方法是在写入sql时使用附加参数.
One way is to use an additional argument while writing to sql.
df.to_sql(method = 'multi')
根据此文档,将"multi"传递给方法参数可让您批量插入.
According to this documentation, passing 'multi' to method argument allows you to bulk insert.
另一种解决方案是使用multiprocessing.dummy构造自定义插入函数. 这是文档的链接: https://docs. python.org/2/library/multiprocessing.html#module-multiprocessing.dummy
Another solution is to construct a custom insert function using multiprocessing.dummy. here is the link to the documentation :https://docs.python.org/2/library/multiprocessing.html#module-multiprocessing.dummy
import math
from multiprocessing.dummy import Pool as ThreadPool
...
def insert_df(df, *args, **kwargs):
nworkers = 4 # number of workers that executes insert in parallel fashion
chunk = math.floor(df.shape[0] / nworkers) # number of chunks
chunks = [(chunk * i, (chunk * i) + chunk) for i in range(nworkers)]
chunks.append((chunk * nworkers, df.shape[0]))
pool = ThreadPool(nworkers)
def worker(chunk):
i, j = chunk
df.iloc[i:j, :].to_sql(*args, **kwargs)
pool.map(worker, chunks)
pool.close()
pool.join()
....
insert_df(df, "foo_bar", engine, if_exists='append')
在 https://stackoverflow.com/a/42164138/5614132 中建议了第二种方法.
The second method was suggested at https://stackoverflow.com/a/42164138/5614132.
这篇关于如何使用并行插入语句在MySQL表中插入大 pandas 数据框?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!