如何使用Python有效地将CSV文件数据插入MYSQL? [英] How to insert a CSV file data into MYSQL using Python efficiently?

查看:46
本文介绍了如何使用Python有效地将CSV文件数据插入MYSQL?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有aprox的CSV输入文件.4百万条记录.插入程序从+2小时开始运行,但仍未完成.数据库仍然为空.

I have a CSV input file with aprox. 4 million records. The insert is running since +2hours and still has not finished. The Database is still empty.

是否有关于如何实际插入值(使用 insert >的建议)的更快建议,例如将插入内容分成多个块?

Any suggestions on how to to actually insert the values (using insert into) and faster, like breaking the insert in chunks?

我是python的新手.

I'm pretty new to python.

  • csv文件示例
43293,cancelled,1,0.0,
1049007,cancelled,1,0.0,
438255,live,1,0.0,classA
1007255,xpto,1,0.0,

  • python脚本
  • def csv_to_DB(xing_csv_input, db_opts):
        print("Inserting csv file {} to database {}".format(xing_csv_input, db_opts['host']))
        conn = pymysql.connect(**db_opts)
        cur = conn.cursor()
        try:
            with open(xing_csv_input, newline='') as csvfile:
                csv_data = csv.reader(csvfile, delimiter=',', quotechar='"')
                for row in csv_data:
                    insert_str = "INSERT INTO table_x (ID, desc, desc_version, val, class) VALUES (%s, %s, %s, %s, %s)"
                    cur.execute(insert_str, row)
            conn.commit()
        finally:
            conn.close()
    

    更新:感谢您的所有投入.正如建议的那样,我尝试了一个计数器,以100个为单位批量插入一个较小的csv数据集(1000行).现在的问题是,尽管计数器多次通过10 x 100次,但是仅插入了100条记录.

    UPDATE: Thanks for all the inputs. As suggested, I tried a counter to insert in batches of 100 and a smaller csv data set (1000 lines). The problem now is only 100 records are inserted, although the counter passes 10 x 100 several times.

    代码更改:

    def csv_to_DB(xing_csv_input, db_opts):
       print("Inserting csv file {} to database {}".format(xing_csv_input, db_opts['host']))
       conn = pymysql.connect(**db_opts)
       cur = conn.cursor()
       count = 0
       try:
           with open(xing_csv_input, newline='') as csvfile:
               csv_data = csv.reader(csvfile, delimiter=',', quotechar='"')
               for row in csv_data:
                   count += 1
                   print(count)
                   insert_str = "INSERT INTO table_x (ID, desc, desc_version, val, class) VALUES (%s, %s, %s, %s, %s)"
    
                   if count >= 100:
                      cur.execute(insert_str, row)
                      print("count100")
                      conn.commit()
                      count = 0
    
                   if not row:
                      cur.execute(insert_str, row)
                      conn.commit()
       finally:
           conn.close()
    

    推荐答案

    有很多方法可以优化此插入.这里有一些想法:

    There are many ways to optimise this insert. Here are some ideas:

    1. 您在整个数据集上都有一个for循环.您可以每100个左右执行一次 commit()
    2. 您可以将许多行插入一个插入
    3. 您可以将两者结合起来,并在CSV上每100行进行多行插入
    4. 如果您不需要python,则可以直接在MySQL上使用它,如此处所述.(如果必须使用python进行操作,则仍可以在python中准备该语句,避免手动循环遍历文件.)
    1. You have a for loop over the entire dataset. You can do a commit() every 100 or so
    2. You can insert many rows into one insert
    3. you can combine the two and make a multi-row insert every 100 rows on your CSV
    4. If python is not a requirement for you can do it directly using MySQL as it's explained here. (If you must do it using python, you can still prepare that statement in python and avoid looping through the file manually).

    示例:

    对于列表中的2,代码将具有以下结构:

    for number 2 in the list, the code will have the following structure:

    def csv_to_DB(xing_csv_input, db_opts):
        print("Inserting csv file {} to database {}".format(xing_csv_input, db_opts['host']))
        conn = pymysql.connect(**db_opts)
        cur = conn.cursor()
        try:
            with open(xing_csv_input, newline='') as csvfile:
                csv_data = csv.reader(csvfile, delimiter=',', quotechar='"')
                to_insert = []
                insert_str = "INSERT INTO table_x (ID, desc, desc_version, val, class) VALUES "
                template = '(%s, %s, %s, %s, %s)'
                count = 0
                for row in csv_data:
                    count += 1
                    to_insert.append(tuple(row))
                    if count % 100 == 0:
                        query = insert_str + '\n'.join([template % r for r in to_insert])
                        cur.execute(query)
                        to_insert = []
                        conn.commit()
                query = insert_str + '\n'.join(template % to_insert)
                cur.execute(query)
                conn.commit()
        finally:
            conn.close()
    

    这篇关于如何使用Python有效地将CSV文件数据插入MYSQL?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆