在python中从http服务器下载文件 [英] Downloading files from an http server in python

查看:60
本文介绍了在python中从http服务器下载文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用 urllib2,我们可以从 Web 服务器获取 http 响应.如果该服务器仅包含文件列表,我们可以解析文件并单独下载每个文件.但是,我不确定解析文件的最简单、最pythonic 的方法是什么.

Using urllib2, we can get the http response from a web server. If that server simply holds a list of files, we could parse through the files and download each individually. However, I'm not sure what the easiest, most pythonic way to parse through the files would be.

当你得到一个完整的通用文件服务器列表的http响应时,通过urllib2的urlopen()方法,我们如何才能整齐地下载每个文件?

When you get a whole http response of the generic file server list, through urllib2's urlopen() method, how can we neatly download each file?

推荐答案

Urllib2 可能可以检索文件列表.对于下载大量二进制文件,PycURL http://pycurl.sourceforge.net/ 是更好的选择.这适用于我的基于 IIS 的文件服务器:

Urllib2 might be OK to retrieve the list of files. For downloading large amounts of binary files PycURL http://pycurl.sourceforge.net/ is a better choice. This works for my IIS based file server:

import re
import urllib2
import pycurl

url = "http://server.domain/"
path = "path/"
pattern = '<A HREF="/%s.*?">(.*?)</A>' % path

response = urllib2.urlopen(url+path).read()

for filename in re.findall(pattern, response):
    with open(filename, "wb") as fp:
        curl = pycurl.Curl()
        curl.setopt(pycurl.URL, url+path+filename)
        curl.setopt(pycurl.WRITEDATA, fp)
        curl.perform()
        curl.close()

这篇关于在python中从http服务器下载文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆