从蟒蛇HTTP服务器下载文件 [英] Downloading files from an http server in python

查看:201
本文介绍了从蟒蛇HTTP服务器下载文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用urllib2的,我们可以得到从web服务器的HTTP响应。如果该服务器简单地保存的文件列表,我们可以通过文件解析和单独下载每个。但是,我不知道通过文件来解析最简单,最Python的方式是什么。

Using urllib2, we can get the http response from a web server. If that server simply holds a list of files, we could parse through the files and download each individually. However, I'm not sure what the easiest, most pythonic way to parse through the files would be.

当你得到普通的文件服务器列表的整个HTTP响应,通过urllib2的公司的urlopen()方法中,我们怎样才能巧妙地下载各个文件?

When you get a whole http response of the generic file server list, through urllib2's urlopen() method, how can we neatly download each file?

推荐答案

urllib2的可能是确定以检索文件的列表。对于下载大量的二进制文件PycURL http://pycurl.sourceforge.net/ 是一个更好的选择。这适用于我的IIS基于文件服务器:

Urllib2 might be OK to retrieve the list of files. For downloading large amounts of binary files PycURL http://pycurl.sourceforge.net/ is a better choice. This works for my IIS based file server:

import re
import urllib2
import pycurl

url = "http://server.domain/"
path = "path/"
pattern = '<A HREF="/%s.*?">(.*?)</A>' % path

response = urllib2.urlopen(url+path).read()

for filename in re.findall(pattern, response):
    fp = open(filename, "wb")
    curl = pycurl.Curl()
    curl.setopt(pycurl.URL, url+path+filename)
    curl.setopt(pycurl.WRITEDATA, fp)
    curl.perform()
    curl.close()
    fp.close()

这篇关于从蟒蛇HTTP服务器下载文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆