当content-type为“application / xml”时,如何使用httplib发布非ASCII字符 [英] How do I post non-ASCII characters using httplib when content-type is "application/xml"

查看:158
本文介绍了当content-type为“application / xml”时,如何使用httplib发布非ASCII字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Python 2.7中实现了一个Pivotal Tracker API模块。 Pivotal Tracker API 希望POST数据成为一个XML文档,application / xml 作为内容类型。



我的代码使用urlib / httplib来发布文档,如图所示:

  request = urllib2.Request(self.url,xml_request.toxml('utf-8')if xml_request else None,self.headers)
obj = parse_xml(self.opener。打开(请求))

当XML文本包含非ASCII字符时,会产生异常: p>

 文件/usr/lib/python2.7/httplib.py,第951行,endheaders 
self._send_output (message_body)
文件/usr/lib/python2.7/httplib.py,第809行,_send_output
msg + = message_body
exceptions.UnicodeDecodeError:'ascii'codec can' t解码位置89中的字节0xc5:序号不在范围内(128)

,httplib._send_output正在创建用于消息有效载荷的ASCII字符串,大概是因为它期望数据被URL编码(应用程序/ x-www-form-urlencoded)。只要只使用ASCII字符,它就适用于application / xml。



有一个直接的方式来发布包含非ASCII字符的应用程序/ xml数据,还是我(例如,使用Twistd和自定义生产者的POST有效载荷)?

解决方案

你正在混合Unicode和bytestrings。

 >>> msg = u'abc'#Unicode字符串
>>>> message_body = b'\xc5'#bytestring
>>> msg + = message_body
追溯(最近的最后一次呼叫):
文件< input>,第1行,< module>
UnicodeDecodeError:'ascii'编解码器无法解码位置0的字节0xc5:ordinal \
不在范围(128)

要解决它,请确保 self.headers 内容正确编码,即所有键,<$ c中的值$ c> header 应该是bytestrings:

  self.headers = dict((k.encode 'ascii')if isinstance(k,unicode)else k,
v.encode('ascii')if isinstance(v,unicode)else v)
for k,v in self.headers.items ())

注意:标题的字符编码与身体的字符编码无关即,xml文本可以独立编码(它只是来自http消息的角色的八位字节流)。



对于 self也是如此。 url - 如果它有 unicode 类型;将它转换成一个bytest(使用'ascii'字符编码)。






self.headers 用于标题, self.url 用于起始行(http方法在这里),可能为 Host http标头(如果客户端是http / 1.1),XML文本将转到消息体(作为二进制blob)。



对于 self.url 使用ASCII编码始终是安全的(IDNA可用于非ascii域名 - 结果也是ASCII) 。



以下是 rfc 7230说的内容http 标头字符编码


从历史上看,HTTP允许使用
ISO-8859-1字符集[ISO-8859-1],通过使用[RFC2047]编码支持其他字符集
。实际上,大多数HTTP头
字段值只使用US-ASCII字符集[USASCII]的一个子集。
新定义的标题字段应该将其字段值限制为
US-ASCII八位字节。收件人应该将字段
内容(obs-text)中的其他八位字节视为不透明数据。


将XML转换为通过测试,请参阅 application / xml encoding condsiderations


对于所有XML MIME实体,推荐使用不带BOM的UTF-8。



I've implemented a Pivotal Tracker API module in Python 2.7. The Pivotal Tracker API expects POST data to be an XML document and "application/xml" to be the content type.

My code uses urlib/httplib to post the document as shown:

    request = urllib2.Request(self.url, xml_request.toxml('utf-8') if xml_request else None, self.headers)
    obj = parse_xml(self.opener.open(request))

This yields an exception when the XML text contains non-ASCII characters:

File "/usr/lib/python2.7/httplib.py", line 951, in endheaders
  self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 809, in _send_output
  msg += message_body
exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 89: ordinal not in range(128)

As near as I can see, httplib._send_output is creating an ASCII string for the message payload, presumably because it expects the data to be URL encoded (application/x-www-form-urlencoded). It works fine with application/xml as long as only ASCII characters are used.

Is there a straightforward way to post application/xml data containing non-ASCII characters or am I going to have to jump through hoops (e.g. using Twistd and a custom producer for the POST payload)?

解决方案

You're mixing Unicode and bytestrings.

>>> msg = u'abc' # Unicode string
>>> message_body = b'\xc5' # bytestring
>>> msg += message_body
Traceback (most recent call last):
  File "<input>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 0: ordinal \
not in range(128)

To fix it, make sure that self.headers content is properly encoded i.e., all keys, values in the headers should be bytestrings:

self.headers = dict((k.encode('ascii') if isinstance(k, unicode) else k,
                     v.encode('ascii') if isinstance(v, unicode) else v)
                    for k,v in self.headers.items())

Note: character encoding of the headers has nothing to do with a character encoding of a body i.e., xml text can be encoded independently (it is just an octet stream from http message's point of view).

The same goes for self.url—if it has the unicode type; convert it to a bytestring (using 'ascii' character encoding).


HTTP message consists of a start-line, "headers", an empty line and possibly a message-body so self.headers is used for headers, self.url is used for start-line (http method goes here) and probably for Host http header (if client is http/1.1), XML text goes to message body (as binary blob).

It is always safe to use ASCII encoding for self.url (IDNA can be used for non-ascii domain names—the result is also ASCII).

Here's what rfc 7230 says about http headers character encoding:

Historically, HTTP has allowed field content with text in the ISO-8859-1 charset [ISO-8859-1], supporting other charsets only through use of [RFC2047] encoding. In practice, most HTTP header field values use only a subset of the US-ASCII charset [USASCII]. Newly defined header fields SHOULD limit their field values to US-ASCII octets. A recipient SHOULD treat other octets in field content (obs-text) as opaque data.

To convert XML to a bytestring, see application/xml encoding condsiderations:

The use of UTF-8, without a BOM, is RECOMMENDED for all XML MIME entities.

这篇关于当content-type为“application / xml”时,如何使用httplib发布非ASCII字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆