如何在python中进行多线程SQL查询,以便获得所有查询的结果 [英] How do I multithread SQL Queries in python such that I obtain the results of all of the queries

查看:959
本文介绍了如何在python中进行多线程SQL查询,以便获得所有查询的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以使用线程来同时执行SQL查询,以便减少下面的代码处理时间?有没有一种更好的方法可以在不使用pandas模块的情况下执行与以下相同的结果?考虑到我正在使用的数据集的大小,我无法将整个数据集存储在内存中,并且发现循环遍历SELECT * FROM语句的行并将它们与我要查询的列表进行比较会增加处理时间. /p>

Is there a way to use threads to simultaneously perform the SQL queries so I can cut down on processing time of my code below? Is there a better method to perform the same result as below without using the pandas module? Given the size of the data sets I am working with I cannot store the entire dataset in memory and I have found looping over the rows of a SELECT * FROM statement and comparing them against the list I am querying with adds to the processing time.

# DATABASE layout
#  _____________________________________________________________
# |     id      |         name       |        description       |
# |_____________|____________________|__________________________|
# |        1    |         John       |       Credit Analyst     |
# |        2    |         Jane       |          Doctor          |
# |      ...    |          ...       |            ...           |
# |  5000000    |       Mohammed     |         Dentist          |
# |_____________|____________________|__________________________|

import sqlite3


SEARCH_IDS = [x for x in range(15000)]
DATABASE_NAME = 'db.db'

def chunks(wholeList, chunkSize=999):
    """Yield successive n-sized chunks from wholeList."""
    for i in range(0, len(wholeList), chunkSize):
        yield wholeList[i:i + chunkSize] 

def search_database_for_matches(listOfIdsToMatch):
    '''Takes a list of ids and returns the rows'''
    conn = sqlite3.connect(DATABASE_NAME)
    cursor = conn.cursor()
    sql = "SELECT id, name, description FROM datatable WHERE id IN ({})".format(', '.join(["?" for x in listOfIdsToMatch]))
    cursor.execute(sql,tuple(listOfIdsToMatch))
    rows = cursor.fetchall()
    return rows

def arrange(orderOnList,listToBeOrdered,defaultReturnValue='N/A'):
    '''Takes a list of ids in the desired order and list of tuples which have ids as the first items.
       the list of tuples is aranged into a new list corresponding to the order of the source list'''
    from collections import OrderedDict
    resultList=[defaultReturnValue for x in orderOnList]
    indexLookUp = OrderedDict( [ ( value , key )   for   key , value   in enumerate( orderOnList ) ] )
    for item in listToBeOrdered:
        resultList[indexLookUp[item[0]]]=item
    return resultList


def main():
    results=[]
    for chunk in chunks(SEARCH_IDS,999):
        results += search_database_for_matches(chunk)
    results = arrange(SEARCH_IDS,results)
    print(results)


if __name__ == '__main__': main()

推荐答案

一些建议:

您应该使用分页,而不是使用迭代器来读取记录.

Instead of reading the records by chucks using a iterator, you ought to use pagination.

查看以下问题:

  • Efficient paging in SQLite with millions of records
  • Sqlite LIMIT / OFFSET query

如果您正在使用多线程/多处理,请确保您的数据库可以支持它. 请参阅: SQLite和多线程

If you're using multithreading / multiprocessing make sure your database can support it. See: SQLite And Multiple Threads

要实现所需的功能,可以使用在每个块上工作的工作人员池.请参见使用工人池在Python文档中.

To implement what you want you can use a pool of workers which work on each chunk. See Using a pool of workers in the Python documentation.

示例:

Import multiprocessing 

with multiprocessing.pool.Pool(process = 4) as pool:
    result = pool.map(search_database_for_match, [for chunk in chunks(SEARCH_IDS,999)])

这篇关于如何在python中进行多线程SQL查询,以便获得所有查询的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆