Python csv:UnicodeDecodeError [英] Python csv: UnicodeDecodeError

查看:232
本文介绍了Python csv:UnicodeDecodeError的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Python的 csv 模块读取文件,还有另一个编码问题(对不起,这里有很多)。

I'm reading in a file with Python's csv module, and have Yet Another Encoding Question (sorry, there are so many on here).

在CSV文件中,有£个符号。读取行并打印后,它们变为\xa3。

In the CSV file, there are £ signs. After reading the row in and printing it, they have become \xa3.

尝试将它们编码为Unicode会产生 UnicodeDecodeError

Trying to encode them as Unicode produces a UnicodeDecodeError:

row = [unicode(x.strip()) for x in row]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa3 in position 0: ordinal not in range(128)

我一直在阅读 csv文档以及关于StackOverflow的许多其他问题。我认为成为\xa3在ASCII中的意思是原始的CSV文件是UTF-8。

I have been reading the csv documentation and the numerous other questions about this on StackOverflow. I think that £ becoming \xa3 in ASCII means that the original CSV file is in UTF-8.

(顺便说一下,是否有快速检查CSV文件编码的方法?)

(Incidentally, is there a quick way to check the encoding of a CSV file?)

在UTF-8中,那么应该不能csv模块能够应对呢?它似乎正在将所有的符号转换为ASCII,即使文档声称它接受UTF-8。

If it's in UTF-8, then shouldn't the csv module be able to cope with it? It seems to be transforming all the symbols into ASCII, even though the documentation claims it accepts UTF-8.

我试过添加一个 unicode_csv_reader 函数,如 csv示例中所述,但它没有帮助。

I've tried adding a unicode_csv_reader function as described in the csv examples, but it doesn't help.

---- EDIT -----

---- EDIT -----

我应该澄清一件事。我看到这个问题,看起来很相似。但是添加 unicode_csv_reader 定义的函数会产生不同的错误:

I should clarify one thing. I have seen this question, which looks very similar. But adding the unicode_csv_reader function defined there produces a different error instead:

yield [unicode(cell, 'utf-8') for cell in row]
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa3 in position 8: unexpected code byte

所以也许我的文件不是UTF8?

So maybe my file isn't UTF8 after all? How can I tell?

推荐答案

尝试使用ISO-8859-1作为编码。您似乎正在处理扩展ASCII而不是Unicode。

Try using the "ISO-8859-1" for your encoding. It seems like you are dealing with extended ASCII, not Unicode.

编辑:

这里有一些简单的代码处理扩展ASCII:

Here's some simple code that deals with extended ASCII:

>>> s = "La Pe\xf1a"
>>> print s
La Pe±a
>>> print s.decode("latin-1")
La Peña
>>>

更好的是,处理给你问题的确切字符:

Even better, dealing with the exact character that is giving you problems:

>>> s = "12\xa3"
>>> print s.decode("latin-1")
12£
>>>

这篇关于Python csv:UnicodeDecodeError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆