如何设置 PYTHONUTF8 环境变量以在 Python 中默认启用 UTF-8 编码? [英] How do I set the PYTHONUTF8 environment variable to enable UTF-8 encoding by default in Python?

查看:57
本文介绍了如何设置 PYTHONUTF8 环境变量以在 Python 中默认启用 UTF-8 编码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Python 3.7 引入了 PYTHONUTF8 环境变量来默认启用 UTF-8 编码.如何从 Python 程序中设置此变量?(我在操作系统的环境变量列表中找不到它.)

Python 3.7 introduced the PYTHONUTF8 environment variable to enable UTF-8 encoding by default. How do I set this variable from within a Python program? (I can't find it in my operating system's list of environment variables.)

推荐答案

要访问环境变量,并在您的平台允许的情况下修改它们(Windows 和所有流行的 Unix 都这样做),只需使用 os.environ.

To access environment variables, and modify them if your platform allows it (which Windows and all popular Unixes do), just use os.environ.

但是,这不会有任何好处,除非您尝试为使用 subprocess 或类似方法启动的 Python 子进程设置环境变量.Python 在启动时读取它的环境变量,使用它们来获取配置信息,并且稍后不再检查它们.

However, this isn’t going to do any good, unless you’re trying to set the environment variable for Python child processes that you’re launching with subprocess or the like. Python reads its environment variables at startup, uses them to pick up configuration information, and doesn’t check them again later.

这些环境变量(和命令行标志)的重点是在您的 shell、启动程序脚本等中设置它们,以便在 Python 启动时它们可用,而不是从 Python 内部设置它们.

The point of these environment variables (and command-line flags) is to set them in your shell, launcher script, etc., so they’re available when Python starts, not to set them from within Python.

通常,如果您需要此设置,您将需要全局设置,因此您需要在 shell 配置文件脚本(适用于 Linux)、操作系统的环境变量 GUI(适用于 Windows)中设置它,或者两者(对于 macOS——虽然在 Mac 上,一切都已经保证设置为 UTF-8,我相信即使你设法以某种方式打破它,Python 也会忽略它).

Normally, if you need this setting, you’re going to need it globally, so you’ll want to set it in your shell profile script (for Linux), your OS’s GUI for environment variables (for Windows), or both (for macOS—although on Mac, everything is already guaranteed to be set to UTF-8, and I believe even if you manage to break that somehow, Python will ignore it).

您不会在现有的环境变量列表中找到它(除非您使用的是一个不寻常的 Linux 发行版,该发行版对区域设置做了一些奇怪的事情,但需要其默认的 Python 来忽略它们),但这不会没关系;你可以添加任何你想要的环境变量.

You’re not going to find this in your existing list of environment variables (unless maybe you’re on an unusual Linux distro that does something odd with the locale settings but needs its default Python to ignore them), but that doesn’t matter; you can add any environment variables you want.

但是,如果您想即时更改内容,虽然您无法通过设置环境变量来实现,但您也不需要这样做.

But if you want to change things on the fly, while you can’t do that by setting an environment variable, you don’t need to, either.

正如文档所述,它控制的是设置文件系统编码、首选编码和stdio文件编码.

As the docs explain, what it controls is setting the filesystem encoding, preferred encoding, and stdio files encoding.

前两个,你可以随时在syslocale中调用相同的函数来设置它们.

The first two, you can just call the same functions in sys and locale to set them at any time.

如果您还想更改 stdio 文件,那就有点棘手了.我相信使动态更改这些文件的编码更容易的提议被拒绝或推迟,因此您唯一能做的就是用包裹在同一文件描述符周围的新文件对象替换它们,看起来像这样(暂时未经测试):

If you also want to change the stdio files, that’s a bit trickier. I believe the proposal to make it easier to change the encoding for these files on the fly was rejected or deferred, so the only thing you can do is replace them with new file objects wrapped around the same file descriptor, which looks something like this (untested for now):

sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8', errors='surrogateescape')
sys.stderr = open(sys.stderr.fileno(), 'w', encoding='utf-8', errors='backslashescape')
sys.stdin = open(sys.stdin.fileno(), 'r', encoding='utf-8', errors='surrogateescape')

如果您已经将任何内容打印到标准输出或将任何内容输入/通过管道输入标准输入,您可能需要先flush所有内容.

If you’ve already printed anything to stdout or typed/piped anything into stdin, you may need to flush everything first.

我所知道的唯一剩下的问题是 sys.argvos.environ 将(至少在 Unix 上)已经用错误的编码解码.您可以在设置默认编码之前通过重新编码和重新解码来修复参数.我认为这使用了语言环境设置,所以它看起来像:

The only remaining issue that I know of is that sys.argv and os.environ will (at least on Unix) have already been decoded with the wrong encoding. You can fix the args by reencoding and redecoding before setting the default encodings. I think this uses the locale settings, so it would look like:

sys.argv = [arg.encode(locale.getpreferredencoding(), errors='surrogateescape').decode('utf8', errors='surrogateescape') for arg in sys.argv]

修复环境有点棘手,因为如果您尝试改变 os.environ,它将执行您不想要的 putenv 调用.如果这是一个问题,最好的选择可能是制作 environ 的转码副本并将其用于查找,并将其显式传递给子进程等.

Fixing the environment is a bit trickier, because if you try to mutate os.environ it’s going to do a putenv call that you don’t want. If this is an issue, the best option is probably to make a transcoded copy of environ and use that for lookups, and explicitly pass it to subprocess, etc.

这篇关于如何设置 PYTHONUTF8 环境变量以在 Python 中默认启用 UTF-8 编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆