当content-type为“application / xml”时,如何使用httplib发布非ASCII字符 [英] How do I post non-ASCII characters using httplib when content-type is "application/xml"
问题描述
我的代码使用urlib / httplib来发布文档,如图所示:
request = urllib2.Request(self.url,xml_request.toxml('utf-8')if xml_request else None,self.headers)
obj = parse_xml(self.opener。打开(请求))
当XML文本包含非ASCII字符时,会产生异常: p>
文件/usr/lib/python2.7/httplib.py,第951行,endheaders
self._send_output (message_body)
文件/usr/lib/python2.7/httplib.py,第809行,_send_output
msg + = message_body
exceptions.UnicodeDecodeError:'ascii'codec can' t解码位置89中的字节0xc5:序号不在范围内(128)
,httplib._send_output正在创建用于消息有效载荷的ASCII字符串,大概是因为它期望数据被URL编码(应用程序/ x-www-form-urlencoded)。只要只使用ASCII字符,它就适用于application / xml。
有一个直接的方式来发布包含非ASCII字符的应用程序/ xml数据,还是我(例如,使用Twistd和自定义生产者的POST有效载荷)?
你正在混合Unicode和bytestrings。
>>> msg = u'abc'#Unicode字符串
>>>> message_body = b'\xc5'#bytestring
>>> msg + = message_body
追溯(最近的最后一次呼叫):
文件< input>,第1行,< module>
UnicodeDecodeError:'ascii'编解码器无法解码位置0的字节0xc5:ordinal \
不在范围(128)
要解决它,请确保 self.headers
内容正确编码,即所有键,<$ c中的值$ c> header 应该是bytestrings:
self.headers = dict((k.encode 'ascii')if isinstance(k,unicode)else k,
v.encode('ascii')if isinstance(v,unicode)else v)
for k,v in self.headers.items ())
注意:标题的字符编码与身体的字符编码无关即,xml文本可以独立编码(它只是来自http消息的角色的八位字节流)。
对于 self也是如此。 url
- 如果它有 unicode
类型;将它转换成一个bytest(使用'ascii'字符编码)。
对于 self.url
使用ASCII编码始终是安全的(IDNA可用于非ascii域名 - 结果也是ASCII) 。
从历史上看,HTTP允许使用
ISO-8859-1字符集[ISO-8859-1],通过使用[RFC2047]编码支持其他字符集
。实际上,大多数HTTP头
字段值只使用US-ASCII字符集[USASCII]的一个子集。
新定义的标题字段应该将其字段值限制为
US-ASCII八位字节。收件人应该将字段
内容(obs-text)中的其他八位字节视为不透明数据。
将XML转换为通过测试,请参阅 application / xml
encoding condsiderations :
对于所有XML MIME实体,推荐使用不带BOM的UTF-8。
I've implemented a Pivotal Tracker API module in Python 2.7. The Pivotal Tracker API expects POST data to be an XML document and "application/xml" to be the content type.
My code uses urlib/httplib to post the document as shown:
request = urllib2.Request(self.url, xml_request.toxml('utf-8') if xml_request else None, self.headers)
obj = parse_xml(self.opener.open(request))
This yields an exception when the XML text contains non-ASCII characters:
File "/usr/lib/python2.7/httplib.py", line 951, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 809, in _send_output
msg += message_body
exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 89: ordinal not in range(128)
As near as I can see, httplib._send_output is creating an ASCII string for the message payload, presumably because it expects the data to be URL encoded (application/x-www-form-urlencoded). It works fine with application/xml as long as only ASCII characters are used.
Is there a straightforward way to post application/xml data containing non-ASCII characters or am I going to have to jump through hoops (e.g. using Twistd and a custom producer for the POST payload)?
You're mixing Unicode and bytestrings.
>>> msg = u'abc' # Unicode string
>>> message_body = b'\xc5' # bytestring
>>> msg += message_body
Traceback (most recent call last):
File "<input>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 0: ordinal \
not in range(128)
To fix it, make sure that self.headers
content is properly encoded i.e., all keys, values in the headers
should be bytestrings:
self.headers = dict((k.encode('ascii') if isinstance(k, unicode) else k,
v.encode('ascii') if isinstance(v, unicode) else v)
for k,v in self.headers.items())
Note: character encoding of the headers has nothing to do with a character encoding of a body i.e., xml text can be encoded independently (it is just an octet stream from http message's point of view).
The same goes for self.url
—if it has the unicode
type; convert it to a bytestring (using 'ascii' character encoding).
HTTP message consists of a start-line, "headers", an empty line and possibly a message-body so self.headers
is used for headers, self.url
is used for start-line (http method goes here) and probably for Host
http header (if client is http/1.1), XML text goes to message body (as binary blob).
It is always safe to use ASCII encoding for self.url
(IDNA can be used for non-ascii domain names—the result is also ASCII).
Here's what rfc 7230 says about http headers character encoding:
Historically, HTTP has allowed field content with text in the ISO-8859-1 charset [ISO-8859-1], supporting other charsets only through use of [RFC2047] encoding. In practice, most HTTP header field values use only a subset of the US-ASCII charset [USASCII]. Newly defined header fields SHOULD limit their field values to US-ASCII octets. A recipient SHOULD treat other octets in field content (obs-text) as opaque data.
To convert XML to a bytestring, see application/xml
encoding condsiderations:
The use of UTF-8, without a BOM, is RECOMMENDED for all XML MIME entities.
这篇关于当content-type为“application / xml”时,如何使用httplib发布非ASCII字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!