对于多线程下载速度更快 [英] Multithreading for faster downloading

查看：159 发布时间：2016/8/5 19:04:08 python beautifulsoup lxml urllib2 urllib

本文介绍了对于多线程下载速度更快的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我怎么能同时下载多个链接？我的下面工作，但只下载一次一个脚本，它是极其缓慢的。我无法弄清楚如何把多线程在我的脚本。

Python脚本：

 从BeautifulSoup进口BeautifulSoup
进口lxml.html为html
进口里urlparse
进口操作系统，SYS
进口的urllib2
进口重打印（下载和解析圣经......）
根= html.parse（开放（'links.html'））
在root.findall（'//一'）链接：
  URL = link.get（'href属性）
  名称= urlparse.urlparse（URL）.path.split（'/'）[ -  1]
  目录名= urlparse.urlparse（URL）.path.split（'。'） -  1]
  F = urllib2.urlopen（URL）
  S = f.read（）
  如果（os.path.isdir（目录名）== 0）：
    os.mkdir（目录名）
  汤= BeautifulSoup（S）
  articleTag = soup.html.body.article
  转换= STR（articleTag）
  full_path = os.path.join（目录名，姓名）
  打开（full_path，'W'）写（折算）
  打印（名称）

称为HTML文件 links.html ：

 ＆LT;一href=\"http://www.youversion.com/bible/gen.1.nmv-fas\">http://www.youversion.com/bible/gen.1.nmv-fas</a>＆LT;一href=\"http://www.youversion.com/bible/gen.2.nmv-fas\">http://www.youversion.com/bible/gen.2.nmv-fas</a>＆LT;一href=\"http://www.youversion.com/bible/gen.3.nmv-fas\">http://www.youversion.com/bible/gen.3.nmv-fas</a>＆LT;一href=\"http://www.youversion.com/bible/gen.4.nmv-fas\">http://www.youversion.com/bible/gen.4.nmv-fas</a>

解决方案

在我看来像消费 - 生产商 - 参见维基

您可以使用

 进口队列，线程＃在这里营造出Queue.Queue
队列= Queue.Queue（）打印（下载和解析圣经......）
根= html.parse（开放（'links.html'））
在root.findall（'//一'）链接：
  URL = link.get（'href属性）
  queue.put（URL）＃农产品
高清thrad（）：
  URL = queue.get（）＃占用
  名称= urlparse.urlparse（URL）.path.split（'/'）[ -  1]
  目录名= urlparse.urlparse（URL）.path.split（'。'） -  1]
  F = urllib2.urlopen（URL）
  S = f.read（）
  如果（os.path.isdir（目录名）== 0）：
    os.mkdir（目录名）
  汤= BeautifulSoup（S）
  articleTag = soup.html.body.article
  转换= STR（articleTag）
  full_path = os.path.join（目录名，姓名）
  打开（full_path，世行）写（折算）
  打印（名称）thread.start_new（thrad，（））＃1启动线程

How can I download multiple links simultaneously? My script below works but only downloads one at a time and it is extremely slow. I can't figure out how to incorporate multithreading in my script.

The Python script:

from BeautifulSoup import BeautifulSoup
import lxml.html as html
import urlparse
import os, sys
import urllib2
import re

print ("downloading and parsing Bibles...")
root = html.parse(open('links.html'))
for link in root.findall('//a'):
  url = link.get('href')
  name = urlparse.urlparse(url).path.split('/')[-1]
  dirname = urlparse.urlparse(url).path.split('.')[-1]
  f = urllib2.urlopen(url)
  s = f.read()
  if (os.path.isdir(dirname) == 0): 
    os.mkdir(dirname)
  soup = BeautifulSoup(s)
  articleTag = soup.html.body.article
  converted = str(articleTag)
  full_path = os.path.join(dirname, name)
  open(full_path, 'w').write(converted)
  print(name)

The HTML file called links.html:

<a href="http://www.youversion.com/bible/gen.1.nmv-fas">http://www.youversion.com/bible/gen.1.nmv-fas</a>

<a href="http://www.youversion.com/bible/gen.2.nmv-fas">http://www.youversion.com/bible/gen.2.nmv-fas</a>

<a href="http://www.youversion.com/bible/gen.3.nmv-fas">http://www.youversion.com/bible/gen.3.nmv-fas</a>

<a href="http://www.youversion.com/bible/gen.4.nmv-fas">http://www.youversion.com/bible/gen.4.nmv-fas</a>

解决方案

It looks to me like the consumer - producer problem - see wikipedia

You may use

import Queue, thread

# create a Queue.Queue here
queue = Queue.Queue()

print ("downloading and parsing Bibles...")
root = html.parse(open('links.html'))
for link in root.findall('//a'):
  url = link.get('href')
  queue.put(url) # produce




def thrad():
  url = queue.get() # consume
  name = urlparse.urlparse(url).path.split('/')[-1]
  dirname = urlparse.urlparse(url).path.split('.')[-1]
  f = urllib2.urlopen(url)
  s = f.read()
  if (os.path.isdir(dirname) == 0): 
    os.mkdir(dirname)
  soup = BeautifulSoup(s)
  articleTag = soup.html.body.article
  converted = str(articleTag)
  full_path = os.path.join(dirname, name)
  open(full_path, 'wb').write(converted)
  print(name)

thread.start_new(thrad, ()) # start 1 threads

这篇关于对于多线程下载速度更快的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

对于多线程下载速度更快 [英] Multithreading for faster downloading

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

对于多线程下载速度更快 [英] Multithreading for faster downloading

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭