使用Python获取标头并转换为JSON(请求-urllib2-json) [英] Get a header with Python and convert in JSON (requests - urllib2 - json)

查看:304
本文介绍了使用Python获取标头并转换为JSON(请求-urllib2-json)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从网站获取标头,将其编码为JSON以将其写入文件. 我尝试了两种不同的方法,但均未成功.

I’m trying to get the header from a website, encode it in JSON to write it to a file. I’ve tried two different ways without success.

首先使用urllib2和json

FIRST with urllib2 and json

import urllib2
import json
host = ("https://www.python.org/")
header = urllib2.urlopen(host).info()
json_header = json.dumps(header)
print json_header

这样,我得到了错误:

TypeError:不是 JSON可序列化

TypeError: is not JSON serializable

因此,我尝试通过将对象转换为字符串来绕过此问题-> json_header = str(header) 这样,我可以json_header = json.dumps(header),但输出结果很奇怪:

So I try to bypass this issue by converting the object to a string -> json_header = str(header) In this way I can json_header = json.dumps(header) but the output it’s weird:

日期:2014年7月2日星期三13:33:37 GMT \ r \ n服务器:nginx \ r \ n内容类型: 文字/HTML; charset = utf-8 \ r \ nX-Frame-Options: SAMEORIGIN \ r \ n内容长度:45682 \ r \ n接受范围:字节\ r \ n通过: 1.1清漆\ r \ n年龄:1263 \ r \ nX-Served-By:cache-fra1220-FRA \ r \ nX-Cache:HIT \ r \ nX-Cache-Hits:2 \ r \ nVary:Cookie \ r \ nStrict -运输安全性: 最大年龄= 63072000; includeSubDomains \ r \ n连接:close \ r \ n"

"Date: Wed, 02 Jul 2014 13:33:37 GMT\r\nServer: nginx\r\nContent-Type: text/html; charset=utf-8\r\nX-Frame-Options: SAMEORIGIN\r\nContent-Length: 45682\r\nAccept-Ranges: bytes\r\nVia: 1.1 varnish\r\nAge: 1263\r\nX-Served-By: cache-fra1220-FRA\r\nX-Cache: HIT\r\nX-Cache-Hits: 2\r\nVary: Cookie\r\nStrict-Transport-Security: max-age=63072000; includeSubDomains\r\nConnection: close\r\n"

第二次请求

import requests
r = requests.get("https://www.python.org/")
rh = r.headers
print rh

{'content-length':'45682','via':'1.1 varnish','x-cache':'HIT', 'accept-ranges':'bytes','strict-transport-security': 'max-age = 63072000; includeSubDomains','vary':'Cookie','server': 'nginx','x-served-by':'cache-fra1226-FRA','x-cache-hits':'14', 'date':'Wed,02 Jul 2014 13:39:33 GMT','x-frame-options': 'SAMEORIGIN','content-type':'text/html; charset = utf-8','年龄': '1619'}

{'content-length': '45682', 'via': '1.1 varnish', 'x-cache': 'HIT', 'accept-ranges': 'bytes', 'strict-transport-security': 'max-age=63072000; includeSubDomains', 'vary': 'Cookie', 'server': 'nginx', 'x-served-by': 'cache-fra1226-FRA', 'x-cache-hits': '14', 'date': 'Wed, 02 Jul 2014 13:39:33 GMT', 'x-frame-options': 'SAMEORIGIN', 'content-type': 'text/html; charset=utf-8', 'age': '1619'}

通过这种方式,输出类似于JSON,但仍然不行(请参见‘’而不是"以及=和;之类的其他内容). 很明显,我在做某件事(或很多事情)的方式不正确. 我已尝试阅读这些模块的文档,但无法理解如何解决此问题. 谢谢您的帮助.

In this way the output is more JSON like but still not OK (see the ‘ ‘ instead of " " and other stuff like the = and ;). Evidently there’s something (or a lot) I’m not doing in the right way. I’ve tried to read the documentation of the modules but I can’t understand how to solve this problem. Thank you for your help.

推荐答案

有多种方法可以将标头编码为JSON,但是我首先想到的是将headers属性转换为实际字典以requests.structures.CaseInsensitiveDict

There are more than a couple ways to encode headers as JSON, but my first thought would be to convert the headers attribute to an actual dictionary instead of accessing it as requests.structures.CaseInsensitiveDict

import requests, json
r = requests.get("https://www.python.org/")
rh = json.dumps(r.headers.__dict__['_store'])
print rh

{'content-length':('content-length','45474'),'via':('via','1.1 清漆"),"x缓存" :("x缓存","HIT"),接受范围": (接受范围",字节"),严格传输安全性": ("strict-transport-security","max-age = 63072000; includeSubDomains"), 'vary':('vary','Cookie'),'server':('server','nginx'), 'x-served-by':('x-served-by','cache-iad2132-IAD'),'x-cache-hits': ('x-cache-hits','1'),'date':('date','Wed,2014 Jul 07 14:13:37 GMT"),"x-frame-options" :("x-frame-options","SAMEORIGIN"), 'content-type':('content-type','text/html; charset = utf-8'),'age': ('age','1483')}

{'content-length': ('content-length', '45474'), 'via': ('via', '1.1 varnish'), 'x-cache': ('x-cache', 'HIT'), 'accept-ranges': ('accept-ranges', 'bytes'), 'strict-transport-security': ('strict-transport-security', 'max-age=63072000; includeSubDomains'), 'vary': ('vary', 'Cookie'), 'server': ('server', 'nginx'), 'x-served-by': ('x-served-by', 'cache-iad2132-IAD'), 'x-cache-hits': ('x-cache-hits', '1'), 'date': ('date', 'Wed, 02 Jul 2014 14:13:37 GMT'), 'x-frame-options': ('x-frame-options', 'SAMEORIGIN'), 'content-type': ('content-type', 'text/html; charset=utf-8'), 'age': ('age', '1483')}

根据标题的确切要求,您可以在此之后专门访问它们,但是如果格式略有不同,这将为您提供标题中包含的所有信息.

Depending on exactly what you want on the headers you can specifically access them after this, but this will give you all the information contained in the headers, if in a slightly different format.

如果您希望使用其他格式,还可以将标题转换为字典:

If you prefer a different format, you can also convert your headers to a dictionary:

import requests, json
r = requests.get("https://www.python.org/")
print json.dumps(dict(r.headers))

{"content-length":"45682","via":"1.1清漆","x-cache":"HIT", "accept-ranges":字节","strict-transport-security": "max-age = 63072000; includeSubDomains","vary":"Cookie","server": "nginx","x-served-by":"cache-at50-ATL","x-cache-hits":"5","date": "2014年7月2日,星期三,格林尼治标准时间","x帧选项":"SAMEORIGIN", "content-type":"text/html; charset = utf-8","age":"951"}

{"content-length": "45682", "via": "1.1 varnish", "x-cache": "HIT", "accept-ranges": "bytes", "strict-transport-security": "max-age=63072000; includeSubDomains", "vary": "Cookie", "server": "nginx", "x-served-by": "cache-at50-ATL", "x-cache-hits": "5", "date": "Wed, 02 Jul 2014 14:08:15 GMT", "x-frame-options": "SAMEORIGIN", "content-type": "text/html; charset=utf-8", "age": "951"}

这篇关于使用Python获取标头并转换为JSON(请求-urllib2-json)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆