在Python中线程化多个SQL查询的良好实践/设计是什么 [英] What is a good practice/design to thread mulitple SQL queries in Python

查看：77 发布时间：2020/5/14 0:02:55 python mysql multithreading

本文介绍了在Python中线程化多个SQL查询的良好实践/设计是什么的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

该网站由大约15个不同的城市组成，每个城市都有10到150页的任意页面.总共约有500页.

The website is organized by about 15 different cities and each city has anywhere from 10 to 150 pages. There is a total of about 500 pages.

对于每个城市的每个页面，我使用BeautifulSoup打开网站，提取所有必要信息，然后执行insert into或update SQL查询.

For each page per city, I open the site using BeautifulSoup, extract all the neccessary information then perform a insert into or update SQL query.

当前我不使用线程，并且花了几分钟时间浏览所有500页，因为Python程序...

Currently I am not using threads, and it takes a few minutes to go through all 500 pages because the Python program...

打开页面.
提取信息.
执行SQL查询.
打开下一页...

理想情况下，我想通过具有10个并发线程(每个线程可打开约50页)来负载均衡.但是我认为编码可能太复杂了.

Ideally I would want to load balance the thread by having, say, 10 concurrent threads that open up about 50 pages each. But I think that may be too complicated to code.

因此，我正在考虑每个城市分配一个线程.我该怎么办?

So instead I am thinking of having one thread per city. How would I accomplish this?

当前，我的代码如下所示:

Currently my code looks like something like this:

//import threading
import BeautifulSoup
import urllib2
import MySQLdb

con = MySQLdb.connect( ... )

def open_page( url ):
    cur = con.cursor()
    // do SQL query

//Get a dictionary of city URL

cities = [
    'http://example.com/atlanta/',
    'http://example.com/los-angeles/',
    ...
    'http://example.com/new-york/'
]

for city_url in cities:
    soup = BeautifulSoup( urllib2.urlopen( city_url ) )

    // find every page per city
    pages = soup.findAll( 'div', { 'class' : 'page' } )

    for page in pages:
        page_url = page.find( 'a' )[ 'href' ]
        open_page( page_url )

在Python中线程化多个SQL查询的良好实践/设计是什么 [英] What is a good practice/design to thread mulitple SQL queries in Python

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录关闭

在Python中线程化多个SQL查询的良好实践/设计是什么 [英] What is a good practice/design to thread mulitple SQL queries in Python

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录 关闭

登录关闭