如何在 python 2.7 中获取真实的文件 url? [英] How do I get a real file url in python 2.7?

查看:37
本文介绍了如何在 python 2.7 中获取真实的文件 url?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个网址 http://www.vbb.de/de/datei/GTFS_VBB_Nov2015_Dez2016.zip 将我重定向"到 http://images.vbb.de/assets/ftp/file/286316.zip.用引号重定向,因为 python 说没有重定向:

I have an url http://www.vbb.de/de/datei/GTFS_VBB_Nov2015_Dez2016.zip which "redirects" me to http://images.vbb.de/assets/ftp/file/286316.zip. Redirect in quotes because python says there is no redirect:

    In [51]: response = requests.get('http://www.vbb.de/de/datei/GTFS_VBB_Nov2015_Dez2016.zip')
        ...: if response.history:
        ...:     print "Request was redirected"
        ...:     for resp in response.history:
        ...:         print resp.status_code, resp.url
        ...:     print "Final destination:"
        ...:     print response.status_code, response.url
        ...: else:
        ...:     print "Request was not redirected"
        ...:     
    Request was not redirected

状态代码也是 200.response.history 什么也没给出.response.url 给出第一个 url 而不是真实的.但是可以在firefox -> developer tools -> network 中获取真实的url.我如何在 python 2.7 中制作?提前致谢!!

Status Code is also 200. response.history gives nothing. response.url gives the first url and not the real one. But it's possible to get the real url in firefox -> developer tools -> network. How do I make in python 2.7? Thanks in advance!!

推荐答案

您需要首先通过从第一个返回的 HTML 解析新的 window.location.href 手动执行重定向.然后使用包含在返回的 Location 标头中的目标文件的名称创建一个 301 回复:

You need to first carry out the redirect manually by parsing the new window.location.href from the first returned HTML. This then creates a 301 reply with the name of the target file contained inside the Location header that is returned:

import requests
import re
import os

base_url = 'http://www.vbb.de'
response = requests.get(base_url + '/de/datei/GTFS_VBB_Nov2015_Dez2016.zip')
manual_redirect = base_url + re.findall('window.location.href\s+=\s+"(.*?)"', response.text)[0]
response = requests.get(manual_redirect, stream=True)
target_filename = response.history[0].headers['Location'].split('/')[-1]

print "Downloading: '{}'".format(target_filename)
with open(target_filename, 'wb') as f_zip:
    for chunk in response.iter_content(chunk_size=1024):
        f_zip.write(chunk)

这将显示:

Downloading: '286316.zip'

并生成一个 29,464,299 字节的 zip 文件.

and result in a 29,464,299 byte zip file being created.

这篇关于如何在 python 2.7 中获取真实的文件 url?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆