如何通过executemany()语句转换要插入的 pandas 数据框? [英] how to transform pandas dataframe for insertion via executemany() statement?

查看:271
本文介绍了如何通过executemany()语句转换要插入的 pandas 数据框?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个很大的大熊猫dataframe - 50左右的标题和几十万行数据-我正在寻找使用ceODBC模块将此数据传输到数据库中.以前,我使用pyodbc并在for循环中使用了一个简单的execute语句,但这花了很长的时间(每10分钟1000条记录)...

I have a fairly big pandas dataframe - 50 or so headers and a few hundred thousand rows of data - and I'm looking to transfer this data to a database using the ceODBC module. Previously I was using pyodbc and using a simple execute statement in a for loop but this was taking ridiculously long (1000 records per 10 minutes)...

我现在正在尝试一个新模块,并试图引入executemany(),尽管我不太确定参数序列的含义是什么:

I'm now trying a new module and am trying to introduce executemany() although I'm not quite sure what's meant by sequence of parameters in:

    cursor.executemany("""insert into table.name(a, b, c, d, e, f) 
values(?, ?, ?, ?, ?), sequence_of_parameters)

看起来像一个遍历每个标头的常量列表

should it look like a constant list working through each header like

    ['asdas', '1', '2014-12-01', 'true', 'asdasd', 'asdas', '2', 
'2014-12-02', 'true', 'asfasd', 'asdfs', '3', '2014-12-03', 'false', 'asdasd']

  • 这是三行的示例
  • 或者需要什么格式?

    作为另一个相关问题,我该如何将常规的熊猫数据框转换为这种格式?

    as another related question, how then can I go about converting a regular pandas dataframe to this format?

    谢谢!

    推荐答案

    最后我设法弄清楚了这一点. 因此,如果您要使用ceODBC(我使用的模块)将Pandas Dataframe写入数据库,则代码为:

    I managed to figure this out in the end. So if you have a Pandas Dataframe which you want to write to a database using ceODBC which is the module I used, the code is:

    (以all_data作为数据框)将数据框值映射到字符串,并将每一行作为一个元组存储在元组列表中

    (with all_data as the dataframe) map dataframe values to string and store each row as a tuple in a list of tuples

    for r in all_data.columns.values:
        all_data[r] = all_data[r].map(str)
        all_data[r] = all_data[r].map(str.strip)   
    tuples = [tuple(x) for x in all_data.values]
    

    对于元组列表,将所有空值指示符(已在上面的转换中作为字符串捕获)更改为可以传递给最终数据库的空类型.这对我来说是个问题,可能对您来说不是.

    for the list of tuples, change all null value signifiers - which have been captured as strings in conversion above - into a null type which can be passed to the end database. This was an issue for me, might not be for you.

    string_list = ['NaT', 'nan', 'NaN', 'None']
    
    def remove_wrong_nulls(x):
        for r in range(len(x)):
            for i,e in enumerate(tuples):
                for j,k in enumerate(e):
                    if k == x[r]:
                        temp=list(tuples[i])
                        temp[j]=None
                        tuples[i]=tuple(temp)
    
    remove_wrong_nulls(string_list)
    

    创建与数据库的连接

    cnxn=ceODBC.connect('DRIVER={SOMEODBCDRIVER};DBCName=XXXXXXXXXXX;UID=XXXXXXX;PWD=XXXXXXX;QUIETMODE=YES;', autocommit=False)
    cursor = cnxn.cursor()
    

    定义一个将元组列表转换为new_list的功能,该函数将进一步在元组列表上建立索引,将其分为1000个块.这对于我将数据传递到SQL查询无法进行的数据库是必要的超过1MB.

    define a function to turn the list of tuples into a new_list which is a further indexing on the list of tuples, into chunks of 1000. This was necessary for me to pass the data to the database whose SQL Query could not exceed 1MB.

    def chunks(l, n):
        n = max(1, n)
        return [l[i:i + n] for i in range(0, len(l), n)]
    
    new_list = chunks(tuples, 1000)
    

    定义查询.

    query = """insert into XXXXXXXXXXXX("XXXXXXXXXX", "XXXXXXXXX", "XXXXXXXXXXX") values(?,?,?)"""
    

    遍历new_list,其中包含以1000为一组的元组列表,然后执行executemany.通过提交并关闭连接来完成此操作,仅此而已:)

    Run through the the new_list containing the list of tuples in groups of 1000 and perform executemany. Follow this by committing and closing the connection and that's it :)

    for i in range(len(new_list)):
        cursor.executemany(query, new_list[i])
    cnxn.commit()
    cnxn.close()
    

    这篇关于如何通过executemany()语句转换要插入的 pandas 数据框?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆