切换到Python 3导致UnicodeDecodeError [英] Switching to Python 3 causing UnicodeDecodeError
问题描述
我刚加入了Sublime的Python3解释器,以下代码停止工作:
for directoryList中的目录:
fileList = os.listdir(directory)
fileList中的文件名:
filename = os.path.join(目录,文件名)
currentFile = open(filename,'rt')
在currentFile中的行:##这里出现异常。
currentLine = line.split('')
currentLine中的单词:
如果word.lower()不在bigBagOfWords中:
bigBagOfWords.append(word.lower())
currentFile.close()
我收到以下异常:
文件/Users/Kuba/Desktop/DictionaryCreator.py,第11行在< module>
在currentFile中的行:
文件/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/encodings/ascii.py,第26行,解码
返回codecs.ascii_decode(input,self.errors)[0]
UnicodeDecodeError:'ascii'编解码器无法解码305位字节0xcc:ordinal不在范围(128)
我发现这个很奇怪,因为据我所知,Python3应该是支持utf-8的。更重要的是,相同的确切代码在Python2.7中没有任何问题。我已经阅读了关于添加环境变量 PYTHONIOENCODING
,但我尝试了 - 无济于事(但是,在OS X Mavericks中添加环境变量似乎并不容易,所以也许我在添加变量时出错了?我修改了/etc/launchd.conf)
Python 3 >阅读时解码文本文件。默认编码取自 locale.getpreferredencoding(False)
,这显然是您的设置返回'ASCII'
。查看 open()
函数文档 :
在文本模式下,如果未指定编码,则使用的编码与平台相关:
locale.getpreferredencoding(False)
被调用以获取当前的区域设置编码。
在系统设置中,您应该使用显式编解码器打开文本文件:
currentFile = open(filename,'rt'编码='latin1')
您在哪里设置 encoding
参数来匹配您正在阅读的文件。
Python 3支持UTF-8作为源代码的默认设置。 >
您可能希望在 Unicode中阅读Python 3和Unicode HOWTO ,它解释了源代码编码和读取和编写Unicode数据。
I've just added Python3 interpreter to Sublime, and the following code stopped working:
for directory in directoryList:
fileList = os.listdir(directory)
for filename in fileList:
filename = os.path.join(directory, filename)
currentFile = open(filename, 'rt')
for line in currentFile: ##Here comes the exception.
currentLine = line.split(' ')
for word in currentLine:
if word.lower() not in bigBagOfWords:
bigBagOfWords.append(word.lower())
currentFile.close()
I get a following exception:
File "/Users/Kuba/Desktop/DictionaryCreator.py", line 11, in <module>
for line in currentFile:
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xcc in position 305: ordinal not in range(128)
I found this rather strange, because as far as I know Python3 is supposed to support utf-8 everywhere. What's more, the same exact code works with no problems on Python2.7. I've read about adding environmental variable PYTHONIOENCODING
, but I tried it - to no avail (however, it appears it is not that easy to add an environmental variable in OS X Mavericks, so maybe I did something wrong with adding the variable? I modidified /etc/launchd.conf)
Python 3 decodes text files when reading. The default encoding is taken from locale.getpreferredencoding(False)
, which evidently for your setup returns 'ASCII'
. See the open()
function documenation:
In text mode, if encoding is not specified the encoding used is platform dependent:
locale.getpreferredencoding(False)
is called to get the current locale encoding.
Instead of relying on a system setting, you should open your text files using an explicit codec:
currentFile = open(filename, 'rt', encoding='latin1')
where you set the encoding
parameter to match the file you are reading.
Python 3 supports UTF-8 as the default for source code.
You may want to read up on Python 3 and Unicode in the Unicode HOWTO, which explains both about source code encoding and reading and writing Unicode data.
这篇关于切换到Python 3导致UnicodeDecodeError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!