切换到 Python 3 导致 UnicodeDecodeError [英] Switching to Python 3 causing UnicodeDecodeError

查看:27
本文介绍了切换到 Python 3 导致 UnicodeDecodeError的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚刚将 Python3 解释器添加到 Sublime,但以下代码停止工作:

I've just added Python3 interpreter to Sublime, and the following code stopped working:

for directory in directoryList:
    fileList = os.listdir(directory)
    for filename in fileList:
        filename = os.path.join(directory, filename)
        currentFile = open(filename, 'rt')
        for line in currentFile:               ##Here comes the exception.
            currentLine = line.split(' ')
            for word in currentLine:
                if word.lower() not in bigBagOfWords:
                    bigBagOfWords.append(word.lower())
        currentFile.close()

我收到以下异常:

  File "/Users/Kuba/Desktop/DictionaryCreator.py", line 11, in <module>
    for line in currentFile:
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xcc in position 305: ordinal not in range(128)

我觉得这很奇怪,因为据我所知,Python3 应该到处都支持 utf-8.更重要的是,完全相同的代码在 Python2.7 上运行没有问题.我读过关于添加环境变量 PYTHONIOENCODING,但我尝试过 - 无济于事(但是,在 OS X Mavericks 中添加环境变量似乎并不容易,所以也许我做了一些事情添加变量有问题吗?我修改了/etc/launchd.conf)

I found this rather strange, because as far as I know Python3 is supposed to support utf-8 everywhere. What's more, the same exact code works with no problems on Python2.7. I've read about adding environmental variable PYTHONIOENCODING, but I tried it - to no avail (however, it appears it is not that easy to add an environmental variable in OS X Mavericks, so maybe I did something wrong with adding the variable? I modidified /etc/launchd.conf)

推荐答案

Python 3 在读取时解码文本文件,在写入时编码.默认编码取自 locale.getpreferredencoding(False),显然对于您的设置返回 'ASCII'.请参阅open() 函数文档:

Python 3 decodes text files when reading, encodes when writing. The default encoding is taken from locale.getpreferredencoding(False), which evidently for your setup returns 'ASCII'. See the open() function documenation:

在文本模式下,如果未指定 encoding,则使用的编码取决于平台:调用 locale.getpreferredencoding(False) 以获取当前区域设置编码.>

In text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the current locale encoding.

您应该使用显式编解码器打开文本文件,而不是依赖系统设置:

Instead of relying on a system setting, you should open your text files using an explicit codec:

currentFile = open(filename, 'rt', encoding='latin1')

您在哪里设置 encoding 参数以匹配您正在阅读的文件.

where you set the encoding parameter to match the file you are reading.

Python 3 支持 UTF-8 作为源代码的默认值.

Python 3 supports UTF-8 as the default for source code.

这同样适用于写入可写文本文件;写入的数据将被编码,如果您依赖系统编码,除非您明确设置合适的编解码器,否则您很可能会遇到 UnicodeEncodingError 异常.编写时使用的编解码器取决于您正在编写的文本以及之后您打算对文件做什么.

The same applies to writing to a writeable text file; data written will be encoded, and if you rely on the system encoding you are liable to get UnicodeEncodingError exceptions unless you explicitly set a suitable codec. What codec to use when writing depends on what text you are writing and what you plan to do with the file afterward.

您可能想在 Unicode HOWTO,其中解释了源代码编码和读写 Unicode 数据.

You may want to read up on Python 3 and Unicode in the Unicode HOWTO, which explains both about source code encoding and reading and writing Unicode data.

这篇关于切换到 Python 3 导致 UnicodeDecodeError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆