Windows控制台编码 [英] Windows console encoding

查看:129
本文介绍了Windows控制台编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Windows上默认的控制台编码是什么?有时似乎是 ANSI编码(( CP-1252 ),有时是 OEM编码 CP-850 默认情况下为西欧)由 chcp 命令给出。

What is the default console encoding on Windows? It seems like sometimes it is the ANSI encoding (CP-1252), sometimes it is the OEM encoding (CP-850 for Western Europe by default) given by the chcp command.


  • 命令行参数环境变量触发ANSI编码(é = 0xe9):

  • Command-line arguments and environment variables trigger the ANSI encoding (é = 0xe9):

> chcp 850
Active code page: 850
> python -c "print 'é'"
Ú
> python -c "print '\x82'"
é
> python -c "print '\xe9'"
Ú
> $env:foobar="é"; python -c "import os; print os.getenv('foobar')"
Ú

> chcp 1252
Active code page: 1252
> python -c "print 'é'"
é
> python -c "print '\x82'"
,
> python -c "print '\xe9'"
é
> $env:foobar="é"; python -c "import os; print os.getenv('foobar')"
é


  • Python控制台标准输入触发OEM编码(如果OEM编码为CP,则é = 0x82 -850,如果OEM编码为CP-1252,é = 0xe9):

  • Python console and standard input trigger the OEM encoding (é = 0x82 if the OEM encoding is CP-850, é = 0xe9 if the OEM encoding is CP-1252):

    > chcp 850
    Active code page: 850
    > python
    >>> print 'é'
    é
    >>> print '\x82'
    é
    >>> print '\xe9'
    Ú
    > python -c "print raw_input()"
    é
    é
    
    > chcp 1252
    Active code page: 1252
    > python
    >>> print 'é'
    é
    >>> print '\x82'
    ,
    >>> print '\xe9'
    é
    > python -c "print raw_input()"
    é
    é
    


  • 注意。 –在这些示例中,我在Windows 10上使用了Powershell 5.1和CPython 2.7.14。

    推荐答案

    首先,对于所有非ASCII字符,重要的是控制台编码和Windows区域设置,您使用的是字节字符串,而Python只会打印出接收到的字节。在将键盘输入传递给Python之前,控制台会将您的键盘输入编码为特定的字节或字节序列。对于Python来说,这些都是不透明的数据(数字范围为0-255),并且 print 将这些数据传递回控制台,就像Python接收它们的方式一样。

    First of all, for all your non-ASCII characters, what matters here is your console encoding and Windows locale settings, you are using byte strings and Python just prints out the bytes it received. Your keyboard input is encoded to a specific byte or byte sequence by the console before those bytes are passed on to Python. To Python, this is all just opaque data (numbers in the range 0-255), and print passes those back to the console the same way Python received them.

    在Powershell中, chcp 代码页无法确定通过命令行开关发送到Python的字节使用哪种编码,而是通过控制面板中的非Unicode程序语言设置(搜索 Region ,然后找到 Administrative 标签)。 此设置会将é编码为0xE9,然后将其作为命令行参数传递给Python。有大量使用0xE9的Windows代码页 é(但没有没有ANSI这样的东西编码)。

    In Powershell, what encoding is used for the bytes sent to Python via command-line switches is not determined by the chcp codepage, but by the Language for non-Unicode programs setting in your control panel (search for Region, then find the Administrative tab). It is this setting that encodes é to 0xE9 before passing it to Python as a command-line argument. There are a large number of Windows codepages that use 0xE9 for é (but there is no such thing as an ANSI encoding).

    环境变量也是如此。 Python将Windows所使用的编码称为 MBCS编解码器;您可以使用'mbcs'编解码器将命令行参数或环境变量解码为Unicode,该编解码器使用 MultiByteToWideChar() 和< a href = https://msdn.microsoft.com/zh-cn/library/windows/desktop/dd374130(v=vs.85).aspx rel = nofollow noreferrer> WideCharToMultiByte( ) Windows API函数,并带有 CP_ACP 标志。

    The same applies to environment variables. Python refers to the encoding Windows uses here as the MBCS codec; you can decode command-line parameters or environment variables to Unicode using the 'mbcs' codec, which uses the MultiByteToWideChar() and WideCharToMultiByte() Windows API functions, with the CP_ACP flag.

    何时使用交互式提示,Python将通过Powershell控制台语言环境代码页编码的字节传递,该页由 chcp 设置。对于您来说,这是代码页850,当您键入é时,会收到一个十六进制值为0x82的字节。因为 print 将相同的0x82字节发送回同一控制台,所以控制台随后将0x82转换回为é字符

    When using the interactive prompt, Python is passed bytes as encoded by the Powershell console locale codepage, set with chcp. For you that's codepage 850, and a byte with the hex value 0x82 is received when you type é. Because print sends the same 0x82 byte back to the same console, the console then translates 0x82 back to a é character on the screen.

    仅当您使用 Unicode文本(带有Unicode字符串文字,如u'é'),Python将对数据进行任何解码和编码。 print 写入 sys.stdout ,该文件配置为将Unicode数据编码为当前语言环境(或 PYTHONIOENCODING (如果已设置) ),因此 printu'é'会将Unicode对象写入 sys.stdout ,然后将该对象编码为使用配置的编解码器字节,然后将这些字节写入控制台。

    Only when you use Unicode text (with a unicode string literal like u'é') would Python do any decoding and encoding of the data. print writes to sys.stdout, which is configured to encode Unicode data to the current locale (or PYTHONIOENCODING if set), so print u'é' would write that Unicode object to sys.stdout, which then encodes that object to bytes using the configured codec, and those bytes are then written to the console.

    从以下位置生成 unicode 对象u'é'源代码文本(本身是字节序列),Python确实必须解码给定的源代码。对于 -c 命令行,在被解码为Latin-1 。在交互式控制台中,使用区域设置。因此,在交互式会话中, python -c printu'é' printu'é'会导致

    To produce the unicode object from the u'é' source code text (itself a sequence of bytes), Python does have to decode the source code given. For the -c command line, the bytes that are passed in are decoded as Latin-1. In the interactive console, the locale is used. So python -c "print u'é'" and print u'é' in the interactive session will result in different output.

    应该注意的是,Python 3始终使用Unicode字符串,并且命令行参数和环境变量通过Windows的宽范围 API加载到Python中,以UTF-16的形式访问数据,然后以Unicode字符串对象的形式呈现。您仍然可以以字节字符串的形式访问控制台数据和文件系统信息,但是从Python 3.6开始,以二进制格式使用UTF-8编码数据访问文件系统和stdin / stdout / stderr流(同样使用宽 API)。

    It should be noted that Python 3 uses Unicode strings throughout, and command-line parameters and environment variables are loaded into Python with the Windows 'wide' APIs to access the data as UTF-16, then presented as Unicode string objects. You can still access console data and filesystem information as byte strings, but as of Python 3.6, accessing the filesystem and stdin/stdout/stderr streams as binary uses UTF-8 encoded data (again using the 'wide' APIs).

    这篇关于Windows控制台编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆