python请求链接标题 [英] python requests link headers

查看:123
本文介绍了python请求链接标题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图找到捕获响应头下列出的链接的最佳方法,就像这个一样,我正在使用python requests模块。以下是Python请求页面中包含Link Headers部分的链接:
docs.python-requests.org/en/latest/user/advanced/



但是,在我的情况下,我的响应头包含以下链接:

  {'content-length':'12276','via': '1.1 varnish-v4','links':'< http://justblahblahblah.com/link8.html> ;; rel =last>,< http://justblahblahblah.com/link2.html> ;; rel =next>','vary':'Accept-Encoding,Origin'} 

请注意>在最后之后,根据请求示例,情况并非如此,我似乎无法弄清楚如何解决这个问题。

解决方案

您可以手动解析标题的值。为了使事情变得简单,您可能需要使用请求的解析函数 parse_header_links 作为参考。

或者你可以做一些查找/替换并使用原始的 parse_header_links

  In [1]:import requests 

In [ 2]:d = {'content-length':'12276','via':'1.1 varnish-v4','links':'<< http://justblahblahblah.com/link8.html> ;; rel = last>,< http://justblahblahblah.com/link2.html> ;; rel =next>','vary':'Accept-Encoding,Origin'}

In [3]:requests.utils.parse_header_links(d ['links']。rstrip('>')。replace('>,<',',&''))
Out [3] :
[{'rel':'last','url':'http://justblahblahblah.com/link8.html'},
{'rel':'next','url' :'http://justblahblahblah.com/link2.html '}]

如果>之间可能有一个或两个空格, 和< 然后您需要用正则表达式替换。


I'm trying to find best way to capture links listed under response headers, exactly like this one and I'm using python requests module. Below is link which has Link Headers section on Python Requests page: docs.python-requests.org/en/latest/user/advanced/

But, in my case my response headers contains links like below:

{'content-length': '12276', 'via': '1.1 varnish-v4', 'links': '<http://justblahblahblah.com/link8.html>;rel="last">,<http://justblahblahblah.com/link2.html>;rel="next">', 'vary': 'Accept-Encoding, Origin'}

Please notice > after "last" which is not the case under Requests examples and I just cant seem to figure out how to solve this.

解决方案

You can parse the header's value manually. To make things easier you might want to use request's parsing function parse_header_links as a reference.

Or you can do some find/replace and use original parse_header_links

In [1]: import requests

In [2]: d = {'content-length': '12276', 'via': '1.1 varnish-v4', 'links': '<http://justblahblahblah.com/link8.html>;rel="last">,<http://justblahblahblah.com/link2.html>;rel="next">', 'vary': 'Accept-Encoding, Origin'}

In [3]: requests.utils.parse_header_links(d['links'].rstrip('>').replace('>,<', ',<'))
Out[3]:
[{'rel': 'last', 'url': 'http://justblahblahblah.com/link8.html'},
 {'rel': 'next', 'url': 'http://justblahblahblah.com/link2.html'}]

If there might be a space or two between >, and < then you need to do replace with a regular expression.

这篇关于python请求链接标题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆