Python csv:UnicodeDecodeError [英] Python csv: UnicodeDecodeError
问题描述
csv
模块阅读文件,并且还有另一个编码问题(对不起,这里有这么多的)。 在CSV文件中有£标志。读取行并打印后,它们已经成为\xa3。
尝试将其编码为Unicode会产生一个 UnicodeDecodeError
:
$ {code} row = [unicode(x.strip())for x in row]
UnicodeDecodeError:'ascii'编解码器无法解码位置0的字节0xa3:ordinal not在范围(128)
我一直在阅读 csv文档以及关于StackOverflow的许多其他问题。我认为在ASCII中成为\xa3意味着原始CSV文件是UTF-8。 (请问有没有快速的方法来检查CSV文件的编码?)
如果是在UTF-8中,那么csv模块是不是可以应付呢?尽管文档声称它接受UTF-8,但似乎将所有符号转换成ASCII码。
我尝试添加一个 unicode_csv_reader
函数,如 csv示例所述,但没有帮助。
----编辑-----
我应该澄清一件事。我看到这个问题,看起来非常相似。但是添加定义的 unicode_csv_reader
函数会产生一个不同的错误:
yield [unicode(cell,'utf-8')for cell in row]
UnicodeDecodeError:'utf8'codec can not decode byte 0xa3 in position 8:unexpected code byte
所以也许我的文件不是UTF8?如何告诉?
尝试使用ISO-8859-1作为编码。看起来你正在处理扩展ASCII,而不是Unicode。
编辑:
这里有一些简单的代码处理扩展ASCII:
>>> s =La Pe \xf1a
>>>>打印s
La Pe±a
>>> print s.decode(latin-1)
LaPeña
>>>
更好的是,处理提供问题的确切字符:
>>> s =12\xa3
>>>> print s.decode(latin-1)
12£
>>>
I'm reading in a file with Python's csv
module, and have Yet Another Encoding Question (sorry, there are so many on here).
In the CSV file, there are £ signs. After reading the row in and printing it, they have become \xa3.
Trying to encode them as Unicode produces a UnicodeDecodeError
:
row = [unicode(x.strip()) for x in row]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa3 in position 0: ordinal not in range(128)
I have been reading the csv documentation and the numerous other questions about this on StackOverflow. I think that £ becoming \xa3 in ASCII means that the original CSV file is in UTF-8.
(Incidentally, is there a quick way to check the encoding of a CSV file?)
If it's in UTF-8, then shouldn't the csv module be able to cope with it? It seems to be transforming all the symbols into ASCII, even though the documentation claims it accepts UTF-8.
I've tried adding a unicode_csv_reader
function as described in the csv examples, but it doesn't help.
---- EDIT -----
I should clarify one thing. I have seen this question, which looks very similar. But adding the unicode_csv_reader
function defined there produces a different error instead:
yield [unicode(cell, 'utf-8') for cell in row]
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa3 in position 8: unexpected code byte
So maybe my file isn't UTF8 after all? How can I tell?
Try using the "ISO-8859-1" for your encoding. It seems like you are dealing with extended ASCII, not Unicode.
Edit:
Here's some simple code that deals with extended ASCII:
>>> s = "La Pe\xf1a"
>>> print s
La Pe±a
>>> print s.decode("latin-1")
La Peña
>>>
Even better, dealing with the exact character that is giving you problems:
>>> s = "12\xa3"
>>> print s.decode("latin-1")
12£
>>>
这篇关于Python csv:UnicodeDecodeError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!