如何加快从 oracle sql 到 pandas df 的数据加载 [英] How to speed up loading data from oracle sql to pandas df

查看:90
本文介绍了如何加快从 oracle sql 到 pandas df 的数据加载的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的代码看起来像这样,我使用 pd.DataFrame.from_records 将数据填充到数据帧中,但是需要 Wall time: 1h 40min 30s 来处理请求并从 sql 表加载数据将 2200 万行放入 df.

My code looks like this, i use pd.DataFrame.from_records to fill data into the dataframe, but it takes Wall time: 1h 40min 30s to process the request and load data from the sql table with 22 mln rows into df.

# I skipped some of the code, since there are no problems with the extract of the query, it's fast
cur = con.cursor()

def db_select(query): # takes the request text and sends it to the data_frame
    cur.execute(query)
    col = [column[0].lower() for column in cur.description] # parse headers
    df = pd.DataFrame.from_records(cur, columns=col) # fill the data into the dataframe
    return df

然后我将sql查询传递给函数:

Then I pass the sql query to the function:

frame = db_select("select * from table")

如何优化代码以加快流程?

How can i optimize code for speed up process?

推荐答案

cur.arraysize 设置适当的值可能有助于 调整提取性能.您需要为它确定最合适的值.默认值为 100.可能会运行具有不同数组大小的代码以确定该值,例如

Setting proper value for cur.arraysize might help for tuning fetch performance . You need to determine the most suitable value for it. The default value is 100. A code with a different array sizes might be run in order to determine that value such as

arr=[100,1000,10000,100000,1000000]
for size in arr:
        try:
            cur.prefetchrows = 0
            cur.arraysize = size
            start = datetime.now()
            cur.execute("SELECT * FROM mytable").fetchall()
            elapsed = datetime.now() - start
            print("Process duration for arraysize ", size," is ", elapsed, " seconds")
        except Exception as err:
            print("Memory Error ", err," for arraysize ", size) 

然后在从原始代码调用 db_select 之前设置例如 cur.arraysize = 10000

and then set such as cur.arraysize = 10000 before calling db_select from your original code

这篇关于如何加快从 oracle sql 到 pandas df 的数据加载的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆