UnicodeEncodeError: 'ascii' codec can't encode character u'u2013' in position 3 2: ordinal not in range(128) [英] UnicodeEncodeError: 'ascii' codec can't encode character u'u2013' in position 3 2: ordinal not in range(128)

查看:17
本文介绍了UnicodeEncodeError: 'ascii' codec can't encode character u'u2013' in position 3 2: ordinal not in range(128)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 xlrd 解析 XSL 文件.大多数事情都运行良好.我有一个字典,其中键是字符串,值是字符串列表.所有的键和值都是 Unicode.我可以使用 str() 方法打印大多数键和值.但是某些值具有 Unicode 字符 u2013,我收到了上述错误.

我怀疑这是因为这是嵌入在 Unicode 中的 Unicode,而 Python 解释器无法对其进行解码.那么我怎样才能摆脱这个错误呢?

解决方案

你也可以打印 Unicode 对象,你不需要在它周围做 str() .

假设你真的想要一个 str:

当您执行 str(u'u2013') 时,您正在尝试将 Unicode 字符串转换为 8 位字符串.为此,您需要使用编码,即 Unicode 数据到 8 位数据之间的映射.str() 所做的是使用系统默认编码,在 Python 2 下是 ASCII.ASCII 仅包含 Unicode 的前 127 个代码点,即 u0000 到 u007F1.结果是你得到了上面的错误,ASCII 编解码器不知道 u2013 是什么(顺便说一句,这是一个长破折号).

因此,您需要指定要使用的编码.常见的有 ISO-8859-1,最常被称为 Latin-1,它包含 256 个第一个代码点;UTF-8,可以使用变长编码,对所有码位进行编码,Windows 通用的CP1252,以及各种中文和日文编码.

你可以这样使用它们:

u'u2013'.encode('utf8')

结果是一个 str 包含一个字节序列,它是相关字符的 uTF8 表示:

'xe2x80x93'

你可以打印出来:

<预><代码>>>>打印 'xe2x80x93'——

I am parsing an XSL file using xlrd. Most of the things are working fine. I have a dictionary where keys are strings and values are lists of strings. All the keys and values are Unicode. I can print most of the keys and values using str() method. But some values have the Unicode character u2013 for which I get the above error.

I suspect that this is happening because this is Unicode embedded in Unicode and the Python interpreter cannot decode it. So how can I get rid of this error?

解决方案

You can print Unicode objects as well, you don't need to do str() around it.

Assuming you really want a str:

When you do str(u'u2013') you are trying to convert the Unicode string to a 8-bit string. To do this you need to use an encoding, a mapping between Unicode data to 8-bit data. What str() does is that is uses the system default encoding, which under Python 2 is ASCII. ASCII contains only the 127 first code points of Unicode, that is u0000 to u007F1. The result is that you get the above error, the ASCII codec just doesn't know what u2013 is (it's a long dash, btw).

You therefore need to specify which encoding you want to use. Common ones are ISO-8859-1, most commonly known as Latin-1, which contains the 256 first code points; UTF-8, which can encode all code-points by using variable length encoding, CP1252 that is common on Windows, and various Chinese and Japanese encodings.

You use them like this:

u'u2013'.encode('utf8')

The result is a str containing a sequence of bytes that is the uTF8 representation of the character in question:

'xe2x80x93'

And you can print it:

>>> print 'xe2x80x93'
–

这篇关于UnicodeEncodeError: 'ascii' codec can't encode character u'u2013' in position 3 2: ordinal not in range(128)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆