编码给出“'ascii'编解码器不能编码不在范围（128）”中的字符...序数。 [英] Encoding gives "'ascii' codec can't encode character … ordinal not in range(128)"

查看：1394 发布时间：2016/11/19 13:06:47 python django unicode character-encoding

本文介绍了编码给出“'ascii'编解码器不能编码不在范围（128）”中的字符...序数。的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用Django RSS阅读器项目此处。

RSS资讯提供会读取类似「OKLAHOMA CITY（AP） - James Harden let」的讯息。 RSS feed的编码读取encoding =UTF-8，因此我相信我在下面的代码段中传递utf-8到markdown。 em dash是它窒息的地方。

我得到的Django错误'ascii'编解码器不能编码字符u'\\\—'在位置109：序数不在范围（128）是UnicodeEncodeError。在传递的变量中，我看到OKLAHOMA CITY（AP）\\\— James Harden。不工作的代码行是：

  content = content.encode（parsed_feed.encoding，xmlcharrefreplace）

我使用markdown 2.0，django 1.1和python 2.4。

我需要做的编码和解码的神奇顺序是什么，才能使这项工作？

（响应Prometheus的请求，我同意格式化有帮助）

所以在视图中我添加了一个smart_unicode行上面parsed_feed编码行...

  content = smart_unicode（content，encoding ='utf-8'，strings_only = False，errors = strict'）
 content = content = content.encode（parsed_feed.encoding，xmlcharrefreplace）

这将问题推送到我的models.py我有

  def save（self，force_insert = False，force_update = False）：
 if self.excerpt：
 self.excerpt_html = markdown（self.excerpt）
＃超级保存后

如果我将保存方法更改为... ...

  def save（self，force_insert = False，force_update = False）：
 if self.excerpt：
 encoded_excerpt_html =（self.excerpt）.encode（'utf-8'）
 self.excerpt_html = markdown（encoded_excerpt_html）

我收到错误'ascii'在位置141中的字节0xe2：序数不在范围（128），因为现在它读为\xe2\x80\x94其中虚线是

解决方案

如果你接收的数据实际上是用UTF-8编码的，那么它应该是一个字节序列 - 一个Python'str'对象，在Python 2.X

您可以通过断言验证：

  assert isinstance（content，str）

一旦你知道这是真的，你可以移动到实际的编码。 Python不进行转码 - 例如，直接从UTF-8到ASCII。您需要首先将您的字节序列转换为Unicode字符串，通过解码：

  unicode_content = content.decode -8'）

（如果你可以信任parsed_feed.encoding，）

然后，您可以获取该字符串，并以ASCII编码，用其XML实体等同替换高字符：

xml_content = unicode_content.encode（'ascii'，'xmlcharrefreplace'）

然后，完整的方法看起来像这样：

  try：
 content = content.decode（parsed_feed.encoding）.encode（'ascii'，'xmlcharrefreplace'）
 except UnicodeDecodeError：
＃无法解码传入字符串 - 可能不编码在utf-8 
＃在这里做一些事情来报告错误

I am working through the Django RSS reader project here.

The RSS feed will read something like "OKLAHOMA CITY (AP) — James Harden let". The RSS feed's encoding reads encoding="UTF-8" so I believe I am passing utf-8 to markdown in the code snippet below. The em dash is where it chokes.

I get the Django error of "'ascii' codec can't encode character u'\u2014' in position 109: ordinal not in range(128)" which is an UnicodeEncodeError. In the variables being passed I see "OKLAHOMA CITY (AP) \u2014 James Harden". The code line that is not working is:

content = content.encode(parsed_feed.encoding, "xmlcharrefreplace")

I am using markdown 2.0, django 1.1, and python 2.4.

What is the magic sequence of encoding and decoding that I need to do to make this work?

(In response to Prometheus' request. I agree the formatting helps)

So in views I add a smart_unicode line above the parsed_feed encoding line...

content = smart_unicode(content, encoding='utf-8', strings_only=False, errors='strict')
content = content = content.encode(parsed_feed.encoding, "xmlcharrefreplace")

This pushes the problem to my models.py for me where I have

def save(self, force_insert=False, force_update=False): 
     if self.excerpt: 
         self.excerpt_html = markdown(self.excerpt) 
         # super save after this

If I change the save method to have...

def save(self, force_insert=False, force_update=False): 
     if self.excerpt: 
         encoded_excerpt_html = (self.excerpt).encode('utf-8') 
         self.excerpt_html = markdown(encoded_excerpt_html)

I get the error "'ascii' codec can't decode byte 0xe2 in position 141: ordinal not in range(128)" because now it reads "\xe2\x80\x94" where the em dash was

解决方案

If the data that you are receiving is, in fact, encoded in UTF-8, then it should be a sequence of bytes -- a Python 'str' object, in Python 2.X

You can verify this with an assertion:

assert isinstance(content, str)

Once you know that that's true, you can move to the actual encoding. Python doesn't do transcoding -- directly from UTF-8 to ASCII, for instance. You need to first turn your sequence of bytes into a Unicode string, by decoding it:

unicode_content = content.decode('utf-8')

(If you can trust parsed_feed.encoding, then use that instead of the literal 'utf-8'. Either way, be prepared for errors.)

You can then take that string, and encode it in ASCII, substituting high characters with their XML entity equivalents:

xml_content = unicode_content.encode('ascii', 'xmlcharrefreplace')

The full method, then, would look somthing like this:

try:
    content = content.decode(parsed_feed.encoding).encode('ascii', 'xmlcharrefreplace')
except UnicodeDecodeError:
    # Couldn't decode the incoming string -- possibly not encoded in utf-8
    # Do something here to report the error

这篇关于编码给出“'ascii'编解码器不能编码不在范围（128）”中的字符...序数。的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

编码给出“'ascii'编解码器不能编码不在范围（128）”中的字符...序数。 [英] Encoding gives "'ascii' codec can't encode character … ordinal not in range(128)"

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

编码给出“'ascii'编解码器不能编码不在范围（128）”中的字符...序数。 [英] Encoding gives &quot;&#39;ascii&#39; codec can&#39;t encode character … ordinal not in range(128)&quot;

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

编码给出“'ascii'编解码器不能编码不在范围（128）”中的字符...序数。 [英] Encoding gives "'ascii' codec can't encode character … ordinal not in range(128)"

登录关闭