使用Python获取标头并转换为JSON(请求-urllib2-json) [英] Get a header with Python and convert in JSON (requests - urllib2 - json)
问题描述
我正在尝试从网站获取标头,将其编码为JSON以将其写入文件. 我尝试了两种不同的方法,但均未成功.
I’m trying to get the header from a website, encode it in JSON to write it to a file. I’ve tried two different ways without success.
首先使用urllib2和json
FIRST with urllib2 and json
import urllib2
import json
host = ("https://www.python.org/")
header = urllib2.urlopen(host).info()
json_header = json.dumps(header)
print json_header
这样,我得到了错误:
TypeError:不是 JSON可序列化
TypeError: is not JSON serializable
因此,我尝试通过将对象转换为字符串来绕过此问题-> json_header = str(header) 这样,我可以json_header = json.dumps(header),但输出结果很奇怪:
So I try to bypass this issue by converting the object to a string -> json_header = str(header) In this way I can json_header = json.dumps(header) but the output it’s weird:
日期:2014年7月2日星期三13:33:37 GMT \ r \ n服务器:nginx \ r \ n内容类型: 文字/HTML; charset = utf-8 \ r \ nX-Frame-Options: SAMEORIGIN \ r \ n内容长度:45682 \ r \ n接受范围:字节\ r \ n通过: 1.1清漆\ r \ n年龄:1263 \ r \ nX-Served-By:cache-fra1220-FRA \ r \ nX-Cache:HIT \ r \ nX-Cache-Hits:2 \ r \ nVary:Cookie \ r \ nStrict -运输安全性: 最大年龄= 63072000; includeSubDomains \ r \ n连接:close \ r \ n"
"Date: Wed, 02 Jul 2014 13:33:37 GMT\r\nServer: nginx\r\nContent-Type: text/html; charset=utf-8\r\nX-Frame-Options: SAMEORIGIN\r\nContent-Length: 45682\r\nAccept-Ranges: bytes\r\nVia: 1.1 varnish\r\nAge: 1263\r\nX-Served-By: cache-fra1220-FRA\r\nX-Cache: HIT\r\nX-Cache-Hits: 2\r\nVary: Cookie\r\nStrict-Transport-Security: max-age=63072000; includeSubDomains\r\nConnection: close\r\n"
第二次请求
import requests
r = requests.get("https://www.python.org/")
rh = r.headers
print rh
{'content-length':'45682','via':'1.1 varnish','x-cache':'HIT', 'accept-ranges':'bytes','strict-transport-security': 'max-age = 63072000; includeSubDomains','vary':'Cookie','server': 'nginx','x-served-by':'cache-fra1226-FRA','x-cache-hits':'14', 'date':'Wed,02 Jul 2014 13:39:33 GMT','x-frame-options': 'SAMEORIGIN','content-type':'text/html; charset = utf-8','年龄': '1619'}
{'content-length': '45682', 'via': '1.1 varnish', 'x-cache': 'HIT', 'accept-ranges': 'bytes', 'strict-transport-security': 'max-age=63072000; includeSubDomains', 'vary': 'Cookie', 'server': 'nginx', 'x-served-by': 'cache-fra1226-FRA', 'x-cache-hits': '14', 'date': 'Wed, 02 Jul 2014 13:39:33 GMT', 'x-frame-options': 'SAMEORIGIN', 'content-type': 'text/html; charset=utf-8', 'age': '1619'}
通过这种方式,输出类似于JSON,但仍然不行(请参见‘’而不是"以及=和;之类的其他内容). 很明显,我在做某件事(或很多事情)的方式不正确. 我已尝试阅读这些模块的文档,但无法理解如何解决此问题. 谢谢您的帮助.
In this way the output is more JSON like but still not OK (see the ‘ ‘ instead of " " and other stuff like the = and ;). Evidently there’s something (or a lot) I’m not doing in the right way. I’ve tried to read the documentation of the modules but I can’t understand how to solve this problem. Thank you for your help.
推荐答案
有多种方法可以将标头编码为JSON
,但是我首先想到的是将headers
属性转换为实际字典以requests.structures.CaseInsensitiveDict
There are more than a couple ways to encode headers as JSON
, but my first thought would be to convert the headers
attribute to an actual dictionary instead of accessing it as requests.structures.CaseInsensitiveDict
import requests, json
r = requests.get("https://www.python.org/")
rh = json.dumps(r.headers.__dict__['_store'])
print rh
{'content-length':('content-length','45474'),'via':('via','1.1 清漆"),"x缓存" :("x缓存","HIT"),接受范围": (接受范围",字节"),严格传输安全性": ("strict-transport-security","max-age = 63072000; includeSubDomains"), 'vary':('vary','Cookie'),'server':('server','nginx'), 'x-served-by':('x-served-by','cache-iad2132-IAD'),'x-cache-hits': ('x-cache-hits','1'),'date':('date','Wed,2014 Jul 07 14:13:37 GMT"),"x-frame-options" :("x-frame-options","SAMEORIGIN"), 'content-type':('content-type','text/html; charset = utf-8'),'age': ('age','1483')}
{'content-length': ('content-length', '45474'), 'via': ('via', '1.1 varnish'), 'x-cache': ('x-cache', 'HIT'), 'accept-ranges': ('accept-ranges', 'bytes'), 'strict-transport-security': ('strict-transport-security', 'max-age=63072000; includeSubDomains'), 'vary': ('vary', 'Cookie'), 'server': ('server', 'nginx'), 'x-served-by': ('x-served-by', 'cache-iad2132-IAD'), 'x-cache-hits': ('x-cache-hits', '1'), 'date': ('date', 'Wed, 02 Jul 2014 14:13:37 GMT'), 'x-frame-options': ('x-frame-options', 'SAMEORIGIN'), 'content-type': ('content-type', 'text/html; charset=utf-8'), 'age': ('age', '1483')}
根据标题的确切要求,您可以在此之后专门访问它们,但是如果格式略有不同,这将为您提供标题中包含的所有信息.
Depending on exactly what you want on the headers you can specifically access them after this, but this will give you all the information contained in the headers, if in a slightly different format.
如果您希望使用其他格式,还可以将标题转换为字典:
If you prefer a different format, you can also convert your headers to a dictionary:
import requests, json
r = requests.get("https://www.python.org/")
print json.dumps(dict(r.headers))
{"content-length":"45682","via":"1.1清漆","x-cache":"HIT", "accept-ranges":字节","strict-transport-security": "max-age = 63072000; includeSubDomains","vary":"Cookie","server": "nginx","x-served-by":"cache-at50-ATL","x-cache-hits":"5","date": "2014年7月2日,星期三,格林尼治标准时间","x帧选项":"SAMEORIGIN", "content-type":"text/html; charset = utf-8","age":"951"}
{"content-length": "45682", "via": "1.1 varnish", "x-cache": "HIT", "accept-ranges": "bytes", "strict-transport-security": "max-age=63072000; includeSubDomains", "vary": "Cookie", "server": "nginx", "x-served-by": "cache-at50-ATL", "x-cache-hits": "5", "date": "Wed, 02 Jul 2014 14:08:15 GMT", "x-frame-options": "SAMEORIGIN", "content-type": "text/html; charset=utf-8", "age": "951"}
这篇关于使用Python获取标头并转换为JSON(请求-urllib2-json)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!