Python是否可以从文件读取非ASCII文本? [英] Is it possible for Python to read non-ascii text from file?
问题描述
我有一个UTF-8格式的.txt文件,无法将其读入Python.我有大量文件,转换将很麻烦.
I have a .txt file that is UTF-8 formatted and have problems to read it into Python. I have a large number of files and a conversion would be cumbersome.
因此,如果我通过读取文件
So if I read the file in via
for line in file_obj:
...
我收到以下错误:
File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 291: ordinal not in range(128)
我想x.decode("utf-8")
无效,因为该错误甚至在读入该行之前就发生了.
I guess x.decode("utf-8")
wouldn't work since the error occurs before the line is even read in.
推荐答案
有两种选择.
- 在打开文件时指定编码,而不使用默认值.
- 以二进制模式打开文件,并将
decode
从bytes
显式地显示为str
.
- Specify the encoding when opening the file, instead of using the default.
- Open the file in binary mode, and explicitly
decode
frombytes
tostr
.
第一个显然是更简单的一个.您没有显示如何打开文件,而是假设您的代码如下所示:
The first is obviously the simpler one. You don't show how you're opening the file, but assuming your code looks like this:
with open(path) as file_obj:
for line in file_obj:
执行此操作:
with open(path, encoding='utf-8') as file_obj:
for line in file_obj:
就是这样.
如文档所述,如果您未指定文本模式下编码:
As the docs explain, if you don't specify an encoding in text mode:
默认编码取决于平台(无论
locale.getpreferredencoding()
返回什么),但是可以使用Python支持的任何编码.
The default encoding is platform dependent (whatever
locale.getpreferredencoding()
returns), but any encoding supported by Python can be used.
在某些情况下(例如,任何OS X或具有适当配置的linux),locale.getpreferredencoding()
始终为'UTF-8'.但是,它显然永远不会自动地对我可能打开的任何文件进行适当处理".因此,如果您知道文件为UTF-8,则应明确指定该文件.
In some cases (e.g., any OS X, or linux with an appropriate configuration), locale.getpreferredencoding()
will always be 'UTF-8'. But it'll obviously never be "automatically whatever's right for any file I might open". So if you know a file is UTF-8, you should specify it explicitly.
这篇关于Python是否可以从文件读取非ASCII文本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!