Python 3 源文件支持哪些文件编码? [英] Which file encodings are supported for Python 3 source files?
问题描述
在你告诉我阅读PEP 0263之前,请保持阅读...
Before you go telling me to read PEP 0263, keep reading...
我找不到任何详细说明Python 3 源文件支持哪些文件编码的文档.
I can't find any documentation that details which file encodings are supported for Python 3 source files.
我发现了数百个(数千个?)关于如何在源文件顶部声明源文件编码的问题、答案、帖子、电子邮件等,但没有一个回答我的问题.忍受我,想象一下做(或实际尝试)以下事情:
I've found hundreds (thousands?) of questions, answers, posts, emails, etc. about how to declare - at the top of your source file - the encoding of that source file, but none of them answer my question. Bear with me and imagine doing (or actually try) the following:
- 打开记事本(我在 Windows 7 上使用普通的旧记事本,但我怀疑这是否重要;我相信您的高级编辑器可以做类似的事情.)
- 键入您最喜欢的 Python 代码行(我使用了
print( 'Hello, world!' )
) - 选择文件"->保存"
- 选择文件夹和文件名(我使用了E:Temphello.py")
- 将编码:"设置从默认的ANSI"更改为Unicode"
- 按保存"
- 打开命令提示符,切换到包含新文件的文件夹,然后尝试运行它
这是我得到的输出:
E:Temp>python --version
Python 3.4.1
E:Temp>python "hello.py"
File "hello.py", line 1
SyntaxError: Non-UTF-8 code starting with 'xff' in file hello.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
现在,当我在 Notepad++ 中打开同一个文件并查看编码"菜单时,它选择了在 UCS-2 Little Endian 中编码"选项.维基百科告诉我这基本上是 UTF-16 编码.任何.我真的不在乎.更多的研究表明,我的编辑器在文件的前面插入了一个值为'xffxfe'的两字节BOM(字节顺序标记)来指示文件编码.所以至少我知道 Python 抱怨的 'xff' 代码来自哪里.
Now, when I open this same file in Notepad++ and look at the "Encoding" menu, it has the option "Encode in UCS-2 Little Endian" selected. Wikipedia tells me that this is basically UTF-16 encoding. Whatever. I don't really care. More research reveals that my editor has inserted a two-byte BOM (Byte Order Mark) with a value of 'xffxfe' at the front of the file to indicate the file encoding. So at least I know where the 'xff' code that Python is complaining about comes from.
所以我去阅读PEP 0263 - 以及其他所有相关内容 -在网络上,我尝试在文件的第一行添加这样的评论
So I go and read PEP 0263 - and everything else regarding it - on the web, and I try adding a comment like this to the first line of the file
# coding: utf-16
使用各种不同的编码值,没有任何帮助.但它无济于事,对吧?因为 Python 甚至没有达到我的编码声明;源文件的第一个字节就卡住了!
with all sorts of different values for the encoding, and nothing helps. But it can't help, right? Because Python isn't even getting as far as my encoding declaration; It's choking on the first byte of the source file!
所以我真正想知道的是...
So what I really want to know is...
- 为什么 Python 3 解释器无法读取此文件?
- 如果不支持Unicode"或UCS-2 Little Endian"或UTF-16"或任何,什么是???
- Why can't the Python 3 interpreter read this file?
- If "Unicode" or "UCS-2 Little Endian" or "UTF-16" or whatever isn't supported, what is???
附言我什至发现 关于 StackOverflow 的另一个问题,这似乎是我遇到的确切问题,但它已关闭 - 错误地在我的意见 - 作为重复.:(
P.S. I even found another question on StackOverflow which seems to be the exact issue I'm having, but it was closed - erroneously in my opinion - as a duplicate. :(
--- 编辑---
有人要我的编译选项".这是一些输出.也许它会有所帮助?
Someone asked for my "compiled options". Here's some output. Maybe it will help?
E:Temp>python
Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:38:22) [MSC v.1600 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sysconfig
>>> print( sysconfig.get_config_vars() )
{'EXT_SUFFIX': '.pyd', 'srcdir': 'C:\Python34', 'py_version_short': '3.4', 'base': 'C:\Python34', 'prefix': 'C:\Python34', 'projectbase': 'C:\Python34', 'INCLUDEPY': 'C:\Python34\Include', 'platbase': 'C:\Python34', 'py_version_nodot': '34', 'exec_prefix': 'C:\Python34', 'EXE': '.exe', 'installed_base': 'C:\Python34', 'SO': '.pyd', 'installed_platbase': 'C:\Python34', 'VERSION': '34', 'BINLIBDEST': 'C:\Python34\Lib', 'LIBDEST': 'C:\Python34\Lib', 'userbase': 'C:\Users\alonghi\AppData\Roaming\Python', 'py_version': '3.4.1', 'abiflags': '', 'BINDIR': 'C:\Python34'}
>>>
推荐答案
源编码必须是:
相关 Python 版本支持的编码.(这因版本和平台而异,例如,您只能在 Windows 上获得
mbcs
.)
与 ASCII 松散兼容,足以使用 ascii
读取 # coding:
声明,这是读取任何声明之前的初始源编码.请参阅 PEP0263概念"第 1 项.
Loosely ASCII-compatible, enough that the # coding:
declaration can be read using ascii
which is the initial source encoding before any declaration is read. See PEP0263 ‘Concepts’ item 1.
Windows 误导性地称为Unicode"的编码,即 UTF-16LE,与 ASCII 不兼容(通常是您应尽量避免使用的一系列问题).Python 需要特殊的特定于编码的支持来检测 UTF-16 源文件,此功能目前已拒绝.
The encoding that Windows misleadingly calls "Unicode", UTF-16LE, is not ASCII-compatible (and generally is a barrel of problems you should try to avoid using). Python would need special encoding-specific support to detect UTF-16 source files and this feature has been declined for now.
您应该使用的 # coding:
几乎总是 UTF-8.
The # coding:
you should use is almost invariably UTF-8.
这篇关于Python 3 源文件支持哪些文件编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!