使用Python从Blob URL下载文件 [英] Download file from Blob URL with Python
问题描述
I wish to have my Python script download the Master data (Download, XLSX) Excel file from this Frankfurt stock exchange webpage.
当使用urrlib
和wget
进行检索时,发现URL导致了 Blob ,并且下载的文件只有289个字节且不可读.
When to retrieve it with urrlib
and wget
, it turns out that the URL leads to a Blob and the file downloaded is only 289 bytes and unreadable.
我完全不了解Blob,并有以下问题:
I'm entirely unfamiliar with Blobs and have these questions:
-
能否使用Python成功检索斑点后面"的文件?
Can the file "behind the Blob" be successfully retrieved using Python?
如果是这样,是否有必要揭露Blob背后的真实" URL(如果有的话)以及如何?我在这里担心的是,上面的链接不是静态的,而是经常更改的.
If so, is it necessary to uncover the "true" URL behind the Blob – if there is such a thing – and how? My concern here is that the link above won't be static but actually change often.
推荐答案
长289个字节的内容可能是403 forbidden
页面的HTML代码.发生这种情况是因为服务器很智能,并且如果您的代码未指定用户代理,则会拒绝该服务器.
That 289 byte long thing might be a HTML code for 403 forbidden
page. This happen because the server is smart and rejects if your code does not specify a user agent.
# python3
import urllib.request as request
url = 'http://www.xetra.com/blob/1193366/b2f210876702b8e08e40b8ecb769a02e/data/All-tradable-ETFs-ETCs-and-ETNs.xlsx'
# fake user agent of Safari
fake_useragent = 'Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5355d Safari/8536.25'
r = request.Request(url, headers={'User-Agent': fake_useragent})
f = request.urlopen(r)
# print or write
print(f.read())
Python 2
# python2
import urllib2
url = 'http://www.xetra.com/blob/1193366/b2f210876702b8e08e40b8ecb769a02e/data/All-tradable-ETFs-ETCs-and-ETNs.xlsx'
# fake user agent of safari
fake_useragent = 'Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5355d Safari/8536.25'
r = urllib2.Request(url, headers={'User-Agent': fake_useragent})
f = urllib2.urlopen(r)
print(f.read())
这篇关于使用Python从Blob URL下载文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!