对于多线程下载速度更快 [英] Multithreading for faster downloading

查看:159
本文介绍了对于多线程下载速度更快的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我怎么能同时下载多个链接?我的下面工作,但只下载一次一个脚本,它是极其缓慢的。我无法弄清楚如何把多线程在我的脚本。

Python脚本:

 从BeautifulSoup进口BeautifulSoup
进口lxml.html为html
进口里urlparse
进口操作系统,SYS
进口的urllib2
进口重打印(下载和解析圣经......)
根= html.parse(开放('links.html'))
在root.findall('//一')链接:
  URL = link.get('href属性)
  名称= urlparse.urlparse(URL).path.split('/')[ - 1]
  目录名= urlparse.urlparse(URL).path.split('。') - 1]
  F = urllib2.urlopen(URL)
  S = f.read()
  如果(os.path.isdir(目录名)== 0):
    os.mkdir(目录名)
  汤= BeautifulSoup(S)
  articleTag = soup.html.body.article
  转换= STR(articleTag)
  full_path = os.path.join(目录名,姓名)
  打开(full_path,'W')写(折算)
  打印(名称)

称为HTML文件 links.html

 &LT;一href=\"http://www.youversion.com/bible/gen.1.nmv-fas\">http://www.youversion.com/bible/gen.1.nmv-fas</a>&LT;一href=\"http://www.youversion.com/bible/gen.2.nmv-fas\">http://www.youversion.com/bible/gen.2.nmv-fas</a>&LT;一href=\"http://www.youversion.com/bible/gen.3.nmv-fas\">http://www.youversion.com/bible/gen.3.nmv-fas</a>&LT;一href=\"http://www.youversion.com/bible/gen.4.nmv-fas\">http://www.youversion.com/bible/gen.4.nmv-fas</a>


解决方案

在我看来像消费 - 生产商 - 参见维基

您可以使用

 进口队列,线程#在这里营造出Queue.Queue
队列= Queue.Queue()打印(下载和解析圣经......)
根= html.parse(开放('links.html'))
在root.findall('//一')链接:
  URL = link.get('href属性)
  queue.put(URL)#农产品
高清thrad():
  URL = queue.get()#占用
  名称= urlparse.urlparse(URL).path.split('/')[ - 1]
  目录名= urlparse.urlparse(URL).path.split('。') - 1]
  F = urllib2.urlopen(URL)
  S = f.read()
  如果(os.path.isdir(目录名)== 0):
    os.mkdir(目录名)
  汤= BeautifulSoup(S)
  articleTag = soup.html.body.article
  转换= STR(articleTag)
  full_path = os.path.join(目录名,姓名)
  打开(full_path,世行)写(折算)
  打印(名称)thread.start_new(thrad,())#1启动线程

How can I download multiple links simultaneously? My script below works but only downloads one at a time and it is extremely slow. I can't figure out how to incorporate multithreading in my script.

The Python script:

from BeautifulSoup import BeautifulSoup
import lxml.html as html
import urlparse
import os, sys
import urllib2
import re

print ("downloading and parsing Bibles...")
root = html.parse(open('links.html'))
for link in root.findall('//a'):
  url = link.get('href')
  name = urlparse.urlparse(url).path.split('/')[-1]
  dirname = urlparse.urlparse(url).path.split('.')[-1]
  f = urllib2.urlopen(url)
  s = f.read()
  if (os.path.isdir(dirname) == 0): 
    os.mkdir(dirname)
  soup = BeautifulSoup(s)
  articleTag = soup.html.body.article
  converted = str(articleTag)
  full_path = os.path.join(dirname, name)
  open(full_path, 'w').write(converted)
  print(name)

The HTML file called links.html:

<a href="http://www.youversion.com/bible/gen.1.nmv-fas">http://www.youversion.com/bible/gen.1.nmv-fas</a>

<a href="http://www.youversion.com/bible/gen.2.nmv-fas">http://www.youversion.com/bible/gen.2.nmv-fas</a>

<a href="http://www.youversion.com/bible/gen.3.nmv-fas">http://www.youversion.com/bible/gen.3.nmv-fas</a>

<a href="http://www.youversion.com/bible/gen.4.nmv-fas">http://www.youversion.com/bible/gen.4.nmv-fas</a>

解决方案

It looks to me like the consumer - producer problem - see wikipedia

You may use

import Queue, thread

# create a Queue.Queue here
queue = Queue.Queue()

print ("downloading and parsing Bibles...")
root = html.parse(open('links.html'))
for link in root.findall('//a'):
  url = link.get('href')
  queue.put(url) # produce




def thrad():
  url = queue.get() # consume
  name = urlparse.urlparse(url).path.split('/')[-1]
  dirname = urlparse.urlparse(url).path.split('.')[-1]
  f = urllib2.urlopen(url)
  s = f.read()
  if (os.path.isdir(dirname) == 0): 
    os.mkdir(dirname)
  soup = BeautifulSoup(s)
  articleTag = soup.html.body.article
  converted = str(articleTag)
  full_path = os.path.join(dirname, name)
  open(full_path, 'wb').write(converted)
  print(name)

thread.start_new(thrad, ()) # start 1 threads

这篇关于对于多线程下载速度更快的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆