使用python加快将大型数据集从txt文件插入到mySQL的速度 [英] Speed up inserting large datasets from txt file to mySQL using python

查看:715
本文介绍了使用python加快将大型数据集从txt文件插入到mySQL的速度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

背景:我有500个格式化的* .txt文件,需要将它们插入到mysql数据库中.目前,我有一个python脚本可以逐行读取文件并将其插入到mySQL数据库中.

background: I have 500 formatted *.txt files that I need to insert into a mysql database. Currently I have a python script to read the files line by line and insert into mySQL database.

问题::文件很大(每个txt文件约100M),我测试了脚本,将一个文件插入数据库花费的时间太长.

Problem: the files are quite big (~100M per txt file), I tested the script and it takes too long to insert just one file to database.

如何通过修改脚本来加快过程?

How can I speed up the process by modifying the scripts?

代码:

for file in os.listdir(INPUTFILEPATH):
    ## index += 1
    ## print "processing %s out of %s files " % (index, totalfiles)
    inputfilename = INPUTFILEPATH + "/" + file 
    open_file = open(inputfilename, 'r')
    contents = open_file.readlines()
    totalLines = len(contents)
    ## index2 = 0 
    for i in range(totalLines):
        ## index2 +=1
        ## print "processing %s out of %s lines " % (index2, totalLines)
        lineString = contents[i]
        lineString = lineString.rstrip('\n')
        values = lineString.split('\t')
        if ( len(re.findall(r'[0123456789_\'\.]',values[0])) > 0 ):  
            continue 
        message = """INSERT INTO %s(word,year,count,volume)VALUES('%s','%s','%s','%s')"""% ('1gram', values[0],values[1],values[2],values[3]) 
        cursor.execute(message)
        db.commit()

cursor.close()
db.close() 

推荐答案

要考虑的两个选项:

1)最简单的方法是在一个插入中包含多行值.这种方法比执行多个插入方法要快得多.

1) the easiest is to include multiple rows of values on one insert. This is way, way faster than doing multiple indserts.

INSERT INTO tbl ( cols ) VALUES ( vals )的想法,做类似INSERT INTO tbl ( cols ) VALUES ( vals ), ( vals ), ( vals )

您一次可以插入的行数取决于mysql服务器的最大数据包大小,但是您可以安全地执行100、1000,也许10000行,这应该使性能提高一个数量级,或者更多的.

The amount of rows you can insert at once depends on the maximum packet size of the mysql server, but you can probably do 100, 1000, maybe 10000 rows safely and it should give you a performance increase of an order of magnitude or more.

请参见 http://dev.mysql.com/doc /refman/5.5/en/insert-speed.html

2)加载数据文件有点不同,需要更多的工作并且有自己的要求,但是非常快.

2) LOAD DATA INFILE is a bit different, requires more work and has its own requirements, but is very very fast.

这篇关于使用python加快将大型数据集从txt文件插入到mySQL的速度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆