当content-type为“application / xml”时，如何使用httplib发布非ASCII字符 [英] How do I post non-ASCII characters using httplib when content-type is "application/xml"

查看：158 发布时间：2017/5/28 21:29:16 python django urllib2 httplib pivotaltracker

本文介绍了当content-type为“application / xml”时，如何使用httplib发布非ASCII字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在Python 2.7中实现了一个Pivotal Tracker API模块。 Pivotal Tracker API 希望POST数据成为一个XML文档，application / xml 作为内容类型。

我的代码使用urlib / httplib来发布文档，如图所示：

  request = urllib2.Request（self.url，xml_request.toxml（'utf-8'）if xml_request else None，self.headers）
 obj = parse_xml（self.opener。打开（请求））

当XML文本包含非ASCII字符时，会产生异常： p>

 文件/usr/lib/python2.7/httplib.py，第951行，endheaders 
 self._send_output （message_body）
文件/usr/lib/python2.7/httplib.py，第809行，_send_output 
 msg + = message_body 
 exceptions.UnicodeDecodeError：'ascii'codec can' t解码位置89中的字节0xc5：序号不在范围内（128）

，httplib._send_output正在创建用于消息有效载荷的ASCII字符串，大概是因为它期望数据被URL编码（应用程序/ x-www-form-urlencoded）。只要只使用ASCII字符，它就适用于application / xml。

有一个直接的方式来发布包含非ASCII字符的应用程序/ xml数据，还是我（例如，使用Twistd和自定义生产者的POST有效载荷）？

解决方案

你正在混合Unicode和bytestrings。

 >>> msg = u'abc'＃Unicode字符串
>>>> message_body = b'\xc5'＃bytestring 
>>> msg + = message_body 
追溯（最近的最后一次呼叫）：
文件< input>，第1行，< module> 
 UnicodeDecodeError：'ascii'编解码器无法解码位置0的字节0xc5：ordinal \ 
不在范围（128）

要解决它，请确保 self.headers 内容正确编码，即所有键，<$ c中的值$ c> header 应该是bytestrings：

  self.headers = dict（（k.encode 'ascii'）if isinstance（k，unicode）else k，
 v.encode（'ascii'）if isinstance（v，unicode）else v）
 for k，v in self.headers.items （））

注意：标题的字符编码与身体的字符编码无关即，xml文本可以独立编码（它只是来自http消息的角色的八位字节流）。

对于 self也是如此。 url - 如果它有 unicode 类型;将它转换成一个bytest（使用'ascii'字符编码）。

self.headers 用于标题， self.url 用于起始行（http方法在这里），可能为 Host http标头（如果客户端是http / 1.1），XML文本将转到消息体（作为二进制blob）。

对于 self.url 使用ASCII编码始终是安全的（IDNA可用于非ascii域名 - 结果也是ASCII）。

以下是 rfc 7230说的内容http 标头字符编码：

从历史上看，HTTP允许使用
ISO-8859-1字符集[ISO-8859-1]，通过使用[RFC2047]编码支持其他字符集
。实际上，大多数HTTP头
字段值只使用US-ASCII字符集[USASCII]的一个子集。
新定义的标题字段应该将其字段值限制为
US-ASCII八位字节。收件人应该将字段
内容（obs-text）中的其他八位字节视为不透明数据。

将XML转换为通过测试，请参阅 application / xml encoding condsiderations ：

对于所有XML MIME实体，推荐使用不带BOM的UTF-8。

I've implemented a Pivotal Tracker API module in Python 2.7. The Pivotal Tracker API expects POST data to be an XML document and "application/xml" to be the content type.

My code uses urlib/httplib to post the document as shown:

    request = urllib2.Request(self.url, xml_request.toxml('utf-8') if xml_request else None, self.headers)
    obj = parse_xml(self.opener.open(request))

This yields an exception when the XML text contains non-ASCII characters:

File "/usr/lib/python2.7/httplib.py", line 951, in endheaders
  self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 809, in _send_output
  msg += message_body
exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 89: ordinal not in range(128)

As near as I can see, httplib._send_output is creating an ASCII string for the message payload, presumably because it expects the data to be URL encoded (application/x-www-form-urlencoded). It works fine with application/xml as long as only ASCII characters are used.

Is there a straightforward way to post application/xml data containing non-ASCII characters or am I going to have to jump through hoops (e.g. using Twistd and a custom producer for the POST payload)?

解决方案

You're mixing Unicode and bytestrings.

>>> msg = u'abc' # Unicode string
>>> message_body = b'\xc5' # bytestring
>>> msg += message_body
Traceback (most recent call last):
  File "<input>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 0: ordinal \
not in range(128)

To fix it, make sure that self.headers content is properly encoded i.e., all keys, values in the headers should be bytestrings:

self.headers = dict((k.encode('ascii') if isinstance(k, unicode) else k,
                     v.encode('ascii') if isinstance(v, unicode) else v)
                    for k,v in self.headers.items())

Note: character encoding of the headers has nothing to do with a character encoding of a body i.e., xml text can be encoded independently (it is just an octet stream from http message's point of view).

The same goes for self.url—if it has the unicode type; convert it to a bytestring (using 'ascii' character encoding).

HTTP message consists of a start-line, "headers", an empty line and possibly a message-body so self.headers is used for headers, self.url is used for start-line (http method goes here) and probably for Host http header (if client is http/1.1), XML text goes to message body (as binary blob).

It is always safe to use ASCII encoding for self.url (IDNA can be used for non-ascii domain names—the result is also ASCII).

Here's what rfc 7230 says about http headers character encoding:

Historically, HTTP has allowed field content with text in the ISO-8859-1 charset [ISO-8859-1], supporting other charsets only through use of [RFC2047] encoding. In practice, most HTTP header field values use only a subset of the US-ASCII charset [USASCII]. Newly defined header fields SHOULD limit their field values to US-ASCII octets. A recipient SHOULD treat other octets in field content (obs-text) as opaque data.

To convert XML to a bytestring, see application/xml encoding condsiderations:

The use of UTF-8, without a BOM, is RECOMMENDED for all XML MIME entities.

这篇关于当content-type为“application / xml”时，如何使用httplib发布非ASCII字符的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

当content-type为“application / xml”时，如何使用httplib发布非ASCII字符 [英] How do I post non-ASCII characters using httplib when content-type is "application/xml"

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

当content-type为“application / xml”时，如何使用httplib发布非ASCII字符 [英] How do I post non-ASCII characters using httplib when content-type is &quot;application/xml&quot;

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

当content-type为“application / xml”时，如何使用httplib发布非ASCII字符 [英] How do I post non-ASCII characters using httplib when content-type is "application/xml"

登录关闭