UnicodeEncodeError:'ascii'编解码器不能编码字符u'\xa3' [英] UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3'

查看:213
本文介绍了UnicodeEncodeError:'ascii'编解码器不能编码字符u'\xa3'的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个我正在阅读的Excel电子表格,其中包含一些£符号。



当我尝试使用xlrd模块读取它时,以下错误:

  x = table.cell_value(row,col)
x = x.decode(ISO-8859 -1)
UnicodeEncodeError:'ascii'编解码器不能编码字符u'\xa3'在位置0:序数不在范围内(128)


如何解决这个问题,并正确读取£符号?



--- UPDATE ---



有些读者建议我不需要解码它,或者我可以当我需要时,将其编码为拉丁语-1。这个问题是,我需要将数据写入CSV文件最终,它似乎反对原始字符串。



如果我不编码或解码数据,然后发生这种情况(在我将字符串添加到名为items的数组后):

  in item:
#item = [x.encode('latin-1')for x in item]
cleancsv.writerow(item)
文件clean_up_barnet.py,第104行, in< module>
cleancsv.writerow(item)
UnicodeEncodeError:'ascii'编解码器不能编码字符u'\\\•'在位置43:序数不在范围内(128)



我得到相同的错误,即使我取消注释拉丁-1行。

解决方案

您的代码段表示 x.decode ,但您会收到 encode 错误 x 是Unicode已经,所以,为了解码它,它必须首先变成一个字节字符串(这是默认编解码器 ansi 出现并失败)。在你的文本中,你说如果我重写o。编码这似乎暗示你知道x是Unicode。

b
$ b

所以你在做什么 - 你是什么意思要做的 - 编码一个unicode x 获取一个编码的字节字符串,或将字符串解码为unicode对象?



我发现很不幸,你可以调用编码在字节字符串和 decode 在unicode对象,因为我发现它似乎引导用户没有什么,但混乱..但至少在这种情况下,你似乎设法传播混乱(至少对我; - )。



如果,似乎, x 是unicode,那么你永远不想解码它 - 你可能想要编码它获得一个字节字符串与某个编解码器,例如拉丁-1,如果这是你需要某种I / O的目的(对于你自己的内部程序使用我建议坚持使用unicode所有的时间 - 只有编码/解码如果和当你绝对需要接收,用于输入/输出目的的编码字节字符串)。


I have an Excel spreadsheet that I'm reading in that contains some £ signs.

When I try to read it in using the xlrd module, I get the following error:

x = table.cell_value(row, col)
x = x.decode("ISO-8859-1")
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position 0: ordinal not in range(128)

If I rewrite this to x.encode('utf-8') it stops throwing an error, but unfortunately when I then write the data out somewhere else (as latin-1), the £ signs have all become garbled.

How can I fix this, and read the £ signs in correctly?

--- UPDATE ---

Some kind readers have suggested that I don't need to decode it at all, or that I can just encode it to Latin-1 when I need to. The problem with this is that I need to write the data to a CSV file eventually, and it seems to object to the raw strings.

If I don't encode or decode the data at all, then this happens (after I've added the string to an array called items):

for item in items:
    #item = [x.encode('latin-1') for x in item]
    cleancsv.writerow(item)
File "clean_up_barnet.py", line 104, in <module>
 cleancsv.writerow(item)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2022' in position 43: ordinal not in range(128)

I get the same error even if I uncomment the Latin-1 line.

解决方案

Your code snippet says x.decode, but you're getting an encode error -- meaning x is Unicode already, so, to "decode" it, it must be first turned into a string of bytes (and that's where the default codec ansi comes up and fails). In your text then you say "if I rewrite ot to x.encode"... which seems to imply that you do know x is Unicode.

So what it IS you're doing -- and what it is you mean to be doing -- encoding a unicode x to get a coded string of bytes, or decoding a string of bytes into a unicode object?

I find it unfortunate that you can call encode on a byte string, and decode on a unicode object, because I find it seems to lead users to nothing but confusion... but at least in this case you seem to manage to propagate the confusion (at least to me;-).

If, as it seems, x is unicode, then you never want to "decode" it -- you may want to encode it to get a byte string with a certain codec, e.g. latin-1, if that's what you need for some kind of I/O purposes (for your own internal program use I recommend sticking with unicode all the time -- only encode/decode if and when you absolutely need, or receive, coded byte strings for input / output purposes).

这篇关于UnicodeEncodeError:'ascii'编解码器不能编码字符u'\xa3'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆