Python解释器假定的代码的默认编码方法是什么? [英] What is the default encoding method for code assumed by Python interpreter?
问题描述
有人使用以下方法声明其Python源代码的文本的编码方法:
#-*-编码:utf-8-*-
早在2001年,据说Python解释器假定的默认编码方法是ASCII.我在Python代码中使用非ASCII字符处理了字符串,而未声明代码的编码方法,而且我不记得我以前遇到过编码错误.现在,Python解释器假定的代码的默认编码是什么?
我不确定这是否相关.我的操作系统是Ubuntu,我正在使用默认的Python解释器以及gedit或emacs进行编辑.如果上述更改,Python解释器的默认编码方法是否会更改?
谢谢.
没有任何明确的编码声明,您的源代码的假定编码将是
- 适用于Python 2.x的
-
ascii
-
utf-8
(适用于Python 3.x)
请参见 PEP 0263 和 PEP 3120 用于Python 3.x的 utf-8
的新默认值.>
因此,假定源代码的默认编码将直接取决于Python解释器的版本,并且不可配置.
请注意,源代码编码与将非ASCII字符作为字符串数据的一部分处理完全不同.
在两种不同的情况下,您可能会遇到非ASCII字符:
- 在运行时,作为程序数据的一部分
- 作为源代码的一部分(由于标识符中不能包含非ASCII字符,因此通常意味着源代码或注释中的硬编码字符串数据).
源代码编码声明会影响您的源代码的编码方式-因此,只有在您决定直接将非ASCII字符放入源代码.
因此,以下代码最终将不得不处理以下事实: data.txt
中可能存在非ASCII字符:
,其中open('data.txt')为f:对于f中的行:#用`line`做点什么
但是它不在源代码中不包含任何非ASCII字符,因此它不需要在文件顶部进行编码声明.但是,如果要将 line
转换为 unicode
,则需要正确解码.只需执行 unicode(line)
将使用系统默认编码,即 ascii
(与默认源编码不同,但碰巧也是 ascii
).因此,要使用 utf-8
显式解码字符串,您需要执行 line.decode('utf-8')
.
但是此代码确实确实在其源代码中直接包含非ASCII字符:
TEST_DATA ='Bär'#< ---此行上的非ASCII字符打印TEST_DATA
除非您声明显式的源代码编码,否则它将失败,并显示类似的 SyntaxError
:
SyntaxError:第1行的文件foo.py中的非ASCII字符'\ xc3',但未声明编码;有关详细信息,请参见http://www.python.org/peps/pep-0263.html
因此,假设您的文本编辑器配置为将文件保存在 utf-8
中,则需要在行中插入
#-*-编码:utf-8-*-
位于文件顶部,供Python正确解释源代码.
但是,我的建议是通常避免在源代码中放入非ASCII字符,这完全是因为如果它取决于您和您的同事的编辑器以及终端设置,那么它们将能够正确地读写.
相反,您可以使用转义的字符串在代码中安全地输入非ASCII字符:
TEST_DATA ='B \ xc3 \ xa4r'
Some people use the following to declare the encoding method for the text of their Python source code:
# -*- coding: utf-8 -*-
Back in 2001, it is said the default encoding method that Python interpreter assumes is ASCII. I have dealt with strings using non-ASCII characters in my Python code, without declaring encoding method of my code, and I don't remember I have bumped into encoding error before. What is the default encoding for code assumed by Python interpreter now?
I am not sure if this is relevant. My OS is Ubuntu, and I am using the default Python interpreter, and gedit or emacs for editing. Will the default encoding method by Python interpreter changes if the above changes?
Thanks.
Without any explicit encoding declaration, the assumed encoding for your source code will be
ascii
for Python 2.xutf-8
for Python 3.x
See PEP 0263 and Using source code encoding for Python 2.x, and PEP 3120 for the new default of utf-8
for Python 3.x.
So the default encoding assumened for source code will be directly dependent of the version of the Python interpreter, and it is not configurable.
Note that the source code encoding is something entirely different than dealing with non-ASCII characters as part of your data in strings.
There are two distinct cases where you may encounter non-ASCII characters:
- As part of your programs data, during runtime
- As part of your source code (and since you can't have non-ASCII characters in identifiers, that usually means hard coded string data in your source code or comments).
The source code encoding declaration affects what encoding your source code will be interpreted with - so it's only needed if you decide to directly put non-ASCII characters in your source code.
So, the following code will eventually have to deal with the fact that there might be non-ASCII characters in data.txt
:
with open('data.txt') as f:
for line in f:
# do something with `line`
But it doesn't contain any non-ASCII characters in the source code, therefore it doesn't need an encoding declaration at the top of the file. It will however need to properly decode line
if it wants to turn it into unicode
. Simply doing unicode(line)
will use the system default encoding, which is ascii
(different from the default source encoding, but happens to also be ascii
). So to explicitely decode the string using utf-8
you'd need to do line.decode('utf-8')
.
This code however does contain non-ASCII characters directly in its source code:
TEST_DATA = 'Bär' # <--- non-ASCII character on this line
print TEST_DATA
And it will fail with a SyntaxError
similar to this, unless you declare an explicit source code encoding:
SyntaxError: Non-ASCII character '\xc3' in file foo.py on line 1, but no encoding declared;
see http://www.python.org/peps/pep-0263.html for details
So assuming your text editor is configured to save files in utf-8
, you'd need to put the line
# -*- coding: utf-8 -*-
at the top of the file for Python to interpret the source code correctly.
My advice however would be to generally avoid putting non-ASCII characters in your source code, exactly because if it depends on your and your co-workers editor and terminal settings wheter it will be written and read correctly.
Instead you can use escaped strings to safely enter non-ASCII characters in your code:
TEST_DATA = 'B\xc3\xa4r'
这篇关于Python解释器假定的代码的默认编码方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!