pandas.DataFrame.to_sql的进度栏 [英] Progress bar for pandas.DataFrame.to_sql

查看:327
本文介绍了pandas.DataFrame.to_sql的进度栏的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将数据从较大的csv文件迁移到sqlite3数据库.

I want to migrate data from a large csv file to sqlite3 database.

我在Python 3.5上使用熊猫的代码:

My code on Python 3.5 using pandas:

con = sqlite3.connect(DB_FILENAME)
df = pd.read_csv(MLS_FULLPATH)
df.to_sql(con=con, name="MLS", if_exists="replace", index=False)

是否可以打印to_sql方法执行的当前状态(进度条)?

Is it possible to print current status (progress bar) of execution of to_sql method?

我看了有关 tqdm 的文章,但没有找到执行该操作的方法.

I looked the article about tqdm, but didn't find how to do this.

推荐答案

不幸的是,DataFrame.to_sql不提供逐块回调,tqdm需要使用该回调来更新其状态.但是,您可以逐块处理数据框:

Unfortuantely DataFrame.to_sql does not provide a chunk-by-chunk callback, which is needed by tqdm to update its status. However, you can process the dataframe chunk by chunk:

import sqlite3
import pandas as pd
from tqdm import tqdm

DB_FILENAME='/tmp/test.sqlite'

def chunker(seq, size):
    # from http://stackoverflow.com/a/434328
    return (seq[pos:pos + size] for pos in xrange(0, len(seq), size))

def insert_with_progress(df, dbfile):
    con = sqlite3.connect(dbfile)
    chunksize = int(len(df) / 10) # 10%
    with tqdm(total=len(df)) as pbar:
        for i, cdf in enumerate(chunker(df, chunksize)):
            replace = "replace" if i == 0 else "append"
            cdf.to_sql(con=con, name="MLS", if_exists=replace, index=False)
            pbar.update(chunksize)

df = pd.DataFrame({'a': range(0,100000)})
insert_with_progress(df, DB_FILENAME)

请注意,我这里是内联生成DataFrame的,目的是为了提供一个没有依赖关系的完整可行示例.

Note I'm generating the DataFrame inline here for the sake of having a complete workable example without dependency.

结果非常惊人:

这篇关于pandas.DataFrame.to_sql的进度栏的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆