使用Python从Blob URL下载文件 [英] Download file from Blob URL with Python

查看:501
本文介绍了使用Python从Blob URL下载文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望我的Python脚本从此

I wish to have my Python script download the Master data (Download, XLSX) Excel file from this Frankfurt stock exchange webpage.

当使用urrlibwget进行检索时,发现URL导致了 Blob ,并且下载的文件只有289个字节且不可读.

When to retrieve it with urrlib and wget, it turns out that the URL leads to a Blob and the file downloaded is only 289 bytes and unreadable.

http://www.xetra.com/blob/1193366/b2f210876702b8e08e40b8ecb769a02e/data/All-tradable-ETFs-ETCs-and-ETNs.xlsx

我完全不了解Blob,并有以下问题:

I'm entirely unfamiliar with Blobs and have these questions:

  • 能否使用Python成功检索斑点后面"的文件?

  • Can the file "behind the Blob" be successfully retrieved using Python?

如果是这样,是否有必要揭露Blob背后的真实" URL(如果有的话)以及如何?我在这里担心的是,上面的链接不是静态的,而是经常更改的.

If so, is it necessary to uncover the "true" URL behind the Blob – if there is such a thing – and how? My concern here is that the link above won't be static but actually change often.

推荐答案

长289个字节的内容可能是403 forbidden页面的HTML代码.发生这种情况是因为服务器很智能,并且如果您的代码未指定用户代理,则会拒绝该服务器.

That 289 byte long thing might be a HTML code for 403 forbidden page. This happen because the server is smart and rejects if your code does not specify a user agent.

# python3
import urllib.request as request

url = 'http://www.xetra.com/blob/1193366/b2f210876702b8e08e40b8ecb769a02e/data/All-tradable-ETFs-ETCs-and-ETNs.xlsx'
# fake user agent of Safari
fake_useragent = 'Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5355d Safari/8536.25'
r = request.Request(url, headers={'User-Agent': fake_useragent})
f = request.urlopen(r)

# print or write
print(f.read())

Python 2

# python2
import urllib2

url = 'http://www.xetra.com/blob/1193366/b2f210876702b8e08e40b8ecb769a02e/data/All-tradable-ETFs-ETCs-and-ETNs.xlsx'
# fake user agent of safari
fake_useragent = 'Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5355d Safari/8536.25'

r = urllib2.Request(url, headers={'User-Agent': fake_useragent})
f = urllib2.urlopen(r)

print(f.read())

这篇关于使用Python从Blob URL下载文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆