如何将混合类型的 pandas DataFrame有效地加载到Oracle DB中 [英] How to efficiently load mixed-type pandas DataFrame into an Oracle DB
问题描述
大家新年快乐!
我目前正在努力解决 ETL性能问题,因为我试图将更大的Pandas DataFrame(1-2个mio行,150列)写入Oracle数据库>.即使只有1000行,Panda的默认 to_sql()
方法也可以在2分钟内正常运行(请参见下面的代码段).
I'm currently struggling with ETL performance issues as I'm trying to write larger Pandas DataFrames (1-2 mio rows, 150 columns) into an Oracle data base. Even for just 1000 rows, Panda's default to_sql()
method runs well over 2 minutes (see code snippet below).
我强烈的假设是,这些性能问题在某种程度上与底层数据类型(主要是字符串)有关.我在1000行随机字符串(基准:3分钟)和1000行大型随机浮点数(基准:15秒)上运行了相同的工作.
My strong hypothesis is that these performance issues are in some way related to the underlying data types (mostly strings). I ran the same job on 1000 rows of random strings (benchmark: 3 min) and 1000 rows of large random floats (benchmark: 15 seconds).
def_save(self, data: pd.DataFrame):
engine = sqlalchemy.create_engine(self._load_args['con'])
table_name = self._load_args["table_name"]
if self._load_args.get("schema", None) is not None:
table_name = self._load_args['schema'] + "." + table_name
with engine.connect() as conn:
data.to_sql(
name=table_name,
conn=conn,
if_exists='replace',
index=False,
method=None# oracle dialect does not support multiline inserts
)
return
这里有人知道如何使用python将混合数据有效地加载到Oracle数据库中吗?
Anyone here how has experience in efficiently loading mixed data into an Oracle data base using python?
非常感谢任何提示,代码段和/或API建议.
Any hints, code snippets and/or API recommendations are very much appreciated.
干杯
推荐答案
如您的问题所述,您无法在数据库风格中使用 method ='multi'
.这是插入速度如此之慢的关键原因,因为数据会逐行传输.
As said in your question, you are not able to use method='multi'
with you db flavor. This is the key reason inserts are so slow, as data going in row by row.
使用@GordThompson建议的SQL * Loader可能是相对较宽/较大表的最快途径.设置SQL * Loader的示例
Using SQL*Loader as suggested by @GordThompson may be fastest route for relatively wide/big table. Example on setting up SQL*Loader
要考虑的另一个选项是 cx_Oracle .请参见使用SqlAlchemy和cx_Oracle将Pandas DataFrame写入Oracle数据库时,加快up_sql()的速度
Another option to consider is cx_Oracle. See Speed up to_sql() when writing Pandas DataFrame to Oracle database using SqlAlchemy and cx_Oracle
这篇关于如何将混合类型的 pandas DataFrame有效地加载到Oracle DB中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!