Python解释器假定的代码的默认编码方法是什么? [英] What is the default encoding method for code assumed by Python interpreter?

查看:50
本文介绍了Python解释器假定的代码的默认编码方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人使用以下方法声明其Python源代码的文本的编码方法:

 #-*-编码:utf-8-*- 

早在2001年,据说Python解释器假定的默认编码方法是ASCII.我在Python代码中使用非ASCII字符处理了字符串,而未声明代码的编码方法,而且我不记得我以前遇到过编码错误.现在,Python解释器假定的代码的默认编码是什么?

我不确定这是否相关.我的操作系统是Ubuntu,我正在使用默认的Python解释器以及gedit或emacs进行编辑.如果上述更改,Python解释器的默认编码方法是否会更改?

谢谢.

解决方案

没有任何明确的编码声明,您的源代码的假定编码将是

    适用于Python 2.x的
  • ascii
  • utf-8 (适用于Python 3.x)

请参见 PEP 0263 解决方案

Without any explicit encoding declaration, the assumed encoding for your source code will be

  • ascii for Python 2.x
  • utf-8 for Python 3.x

See PEP 0263 and Using source code encoding for Python 2.x, and PEP 3120 for the new default of utf-8 for Python 3.x.

So the default encoding assumened for source code will be directly dependent of the version of the Python interpreter, and it is not configurable.


Note that the source code encoding is something entirely different than dealing with non-ASCII characters as part of your data in strings.

There are two distinct cases where you may encounter non-ASCII characters:

  • As part of your programs data, during runtime
  • As part of your source code (and since you can't have non-ASCII characters in identifiers, that usually means hard coded string data in your source code or comments).

The source code encoding declaration affects what encoding your source code will be interpreted with - so it's only needed if you decide to directly put non-ASCII characters in your source code.

So, the following code will eventually have to deal with the fact that there might be non-ASCII characters in data.txt:

with open('data.txt') as f:
    for line in f:
        # do something with `line`

But it doesn't contain any non-ASCII characters in the source code, therefore it doesn't need an encoding declaration at the top of the file. It will however need to properly decode line if it wants to turn it into unicode. Simply doing unicode(line) will use the system default encoding, which is ascii (different from the default source encoding, but happens to also be ascii). So to explicitely decode the string using utf-8 you'd need to do line.decode('utf-8').


This code however does contain non-ASCII characters directly in its source code:

TEST_DATA = 'Bär'    # <--- non-ASCII character on this line
print TEST_DATA

And it will fail with a SyntaxError similar to this, unless you declare an explicit source code encoding:

SyntaxError: Non-ASCII character '\xc3' in file foo.py on line 1, but no encoding declared;
see http://www.python.org/peps/pep-0263.html for details

So assuming your text editor is configured to save files in utf-8, you'd need to put the line

# -*- coding: utf-8 -*-

at the top of the file for Python to interpret the source code correctly.

My advice however would be to generally avoid putting non-ASCII characters in your source code, exactly because if it depends on your and your co-workers editor and terminal settings wheter it will be written and read correctly.

Instead you can use escaped strings to safely enter non-ASCII characters in your code:

TEST_DATA = 'B\xc3\xa4r'

这篇关于Python解释器假定的代码的默认编码方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆