在MySQL中以增量方式选择记录,并在Python中保存为csv [英] Select records incrementally in MySQL and save to csv in Python

查看:191
本文介绍了在MySQL中以增量方式选择记录,并在Python中保存为csv的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要查询数据库以进行一些数据分析,并且我有超过2000万条记录.我对数据库的访问受到限制,并且我的查询在8分钟后超时.因此,我试图将查询分成较小的部分,并将结果保存到excel中,以便以后处理.

I need to query the database for some data analysis and I have more than 20 millions records. I have limited access to the DB and my query times out after 8 mins. So, I am trying to break up the query into smaller portions and save the results to excel for processing later.

这是我到目前为止所拥有的.我怎样才能让python在每个x数(例如1,000,000)的记录上循环查询并将它们存储在同一csv中,直到搜索到所有(20 mil ++)记录?

This is what I have so far. How can I get python to loop the queries over every x-number (e.g 1,000,000) of records and store them in the same csv until all (20 mil++) records are searched?

import MySQLdb
import csv

db_main = MySQLdb.connect(host="localhost", 
                     port = 1234,
                     user="user1", 
                      passwd="test123", 
                      db="mainDB") 

cur = db_main .cursor()

cur.execute("SELECT a.user_id, b.last_name, b.first_name, 
    FLOOR(DATEDIFF(CURRENT_DATE(), c.birth_date) / 365) age,
    DATEDIFF(b.left_date, b.join_date) workDays
    FROM users a
    INNER JOIN users_signup b ON a.user_id a = b.user_id
    INNER JOIN users_personal c ON a.user_id a = c.user_id
    INNER JOIN
    (
        SELECT distinct d.a.user_id FROM users_signup d
        WHERE (user_id >=1 AND user_id <1000000)
        AND d.join_date >= '2013-01-01' and d.join_date < '2014-01-01'
    ) 
    AS t ON a.user_id = t.user_id") 

result=cur.fetchall()
c = csv.writer(open("temp.csv","wb"))
for row in result:
    c.writerow(row)

推荐答案

以下是可以帮助您的实施示例:

Here is an example of implementation that might help you:

from contextlib import contextmanager
import MySQLdb
import csv

connection_args = {"host": "localhost", "port": 1234, "user": "user1", "passwd": "test123", "db": "mainDB"}

@contextmanager
def get_cursor(**kwargs):
    ''' The contextmanager allow to automatically close
    the cursor.
    '''
    db = MySQLdb.connect(**kwargs)
    cursor = db.cursor()
    try:
        yield cursor
    finally:
        cursor.close()

# note the placeholders for the limits
query = """ SELECT a.user_id, b.last_name, b.first_name,
        FLOOR(DATEDIFF(CURRENT_DATE(), c.birth_date) / 365) age,
        DATEDIFF(b.left_date, b.join_date) workDays
    FROM users a
    INNER JOIN users_signup b ON a.user_id a = b.user_id
    INNER JOIN users_personal c ON a.user_id a = c.user_id
    INNER JOIN
    (
        SELECT distinct d.a.user_id FROM users_signup d
        WHERE (user_id >= 1 AND user_id < 1000000)
        AND d.join_date >= '2013-01-01' and d.join_date < '2014-01-01'
    ) AS t ON a.user_id = t.user_id OFFSET %s LIMIT %s """

csv_file = csv.writer(open("temp.csv","wb"))

# One million at the time
STEP = 1000000
for step_nb in xrange(0, 20):
    with get_cursor(**connection_args) as cursor:
        cursor.execute(query, (step_nb * STEP, (step_nb + 1) * STEP))  # query the DB
        for row in cursor:  # use the cursor instead of fetching everything in memory
            csv_file.writerow(row)

误解了什么是批处理(尽管它在user_id上)

Edited: misunderstanding of what was the Batch (though it was on user_id)

这篇关于在MySQL中以增量方式选择记录,并在Python中保存为csv的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆