Python 2假定不同的源代码编码 [英] Python 2 assumes different source code encodings

查看：94 发布时间：2020/6/29 19:34:58 python character-encoding ascii iso-8859-1 python-internals

本文介绍了Python 2假定不同的源代码编码的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我注意到，在没有源代码编码声明的情况下，Python 2解释器假定源代码使用脚本和标准输入 进行ASCII编码:

I noticed that without source code encoding declaration, the Python 2 interpreter assumes the source code is encoded in ASCII with scripts and standard input:

$ python test.py  # where test.py holds the line: print u'é'
  File "test.py", line 1
SyntaxError: Non-ASCII character '\xc3' in file test.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

$ echo "print u'é'" | python
  File "/dev/fd/63", line 1
SyntaxError: Non-ASCII character '\xc3' in file /dev/fd/63 on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

，并使用-m module 和-c command 标志在ISO-8859-1中进行了编码:

and it is encoded in ISO-8859-1 with the -m module and -c command flags:

$ python -m test  # where test.py holds the line: print u'é'
Ã©

$ python -c "print u'é'"
Ã©

记录在哪里?

将此与Python 3进行对比，后者始终假定源代码采用UTF-8编码，因此在四种情况下均显示é.

Contrast this to Python 3 which always assumes the source code is encoded in UTF-8 and thus prints é in the four cases.

注意. –我在控制台编码设置为UTF-8的macOS 10.13和Ubuntu Linux 17.10的CPython 2.7.14上进行了测试.

推荐答案

-c和-m开关，最终^(*)运行

The -c and -m switches, ultimately^(*) run the code supplied with the exec statement or the compile() function, both of which take Latin-1 source code:

第一个表达式的计算结果应为Unicode字符串， Latin-1 编码的字符串，打开的文件对象，代码对象或元组.

The first expression should evaluate to either a Unicode string, a Latin-1 encoded string, an open file object, a code object, or a tuple.

这没有记录，它是一个实现细节，可能会也可能不会被视为错误.

This is not documented, it's an implementation detail, that may or may not be considered a bug.

我不认为这是值得修复的东西，而Latin-1是ASCII的超集，因此损失不大. Python 3中已经清理了如何处理-c和-m中的代码，并且在那里更加一致.与-c一起传递的代码将使用当前语言环境进行解码，并且像往常一样，使用-m开关加载的模块默认为UTF-8.

I don't think it is something that is worth fixing however, and Latin-1 is a superset of ASCII so little is lost. How code from -c and -m is handled has been cleaned up in Python 3 and is much more consistent there; code passed in with -c is decoded using the current locale, and modules loaded with the -m switch default to UTF-8, as usual.

^(*)如果您想知道使用的确切实现，请从

^(*) If you want to know the exact implementations used, start at the Py_Main() function in Modules/main.c, which handles both -c and -m as:

if (command) {
    sts = PyRun_SimpleStringFlags(command, &cf) != 0;
    free(command);
} else if (module) {
    sts = RunModule(module, 1);
    free(module);
}

-c通过 PyRun_SimpleStringFlags()函数执行，依次调用 PyRun_StringFlags() 一个>.当您使用exec时，也会将一个字节字符串对象传递给PyRun_StringFlags()，然后假定源代码包含拉丁1编码的字节.

-m使用函数，以将模块名称传递给 runpy模块中的私有函数_run_module_as_main() ，它使用 pkgutil.get_loader() 加载模块元数据，并使用 PEP 302加载程序上的loader.get_code()函数获取模块代码对象;如果没有可用的缓存字节码，则代码对象通过使用compile()函数并将模式设置为exec来生成.

-c is executed through the PyRun_SimpleStringFlags() function, which in turn calls PyRun_StringFlags(). When you use exec a bytestring object is passed to PyRun_StringFlags() too, and the source code is then assumed to contain Latin-1-encoded bytes.
-m uses the RunModule() function to pass the module name to the private function _run_module_as_main() in the runpy module, which uses pkgutil.get_loader() to load the module metadata, and fetches the module code object with the loader.get_code() function on the PEP 302 loader; if no cached bytecode is available then the code object is produced by using the compile() function with the mode set to exec.

这篇关于Python 2假定不同的源代码编码的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python 2假定不同的源代码编码 [英] Python 2 assumes different source code encodings

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python 2假定不同的源代码编码 [英] Python 2 assumes different source code encodings

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭