Windows cmd 编码更改导致 Python 崩溃 [英] Windows cmd encoding change causes Python crash

查看:19
本文介绍了Windows cmd 编码更改导致 Python 崩溃的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先我将 Windows CMD 编码更改为 utf-8 并运行 Python 解释器:

chcp 65001Python

然后我尝试在其中打印一个 unicode sting,当我这样做时,Python 以一种特殊的方式崩溃(我只是在同一窗口中收到一个 cmd 提示).

<预><代码>>>>导入系统>>>打印 u'ëèæîð'.encode(sys.stdin.encoding)

知道为什么会发生这种情况以及如何使其发挥作用吗?

UPD:sys.stdin.encoding 返回 'cp65001'

UPD2:我突然想到这个问题可能与 utf-8 使用 多字节字符集(kcwu 对此提出了很好的观点).我尝试使用windows-1250"运行整个示例并得到ëea"?Windows-1250 使用单字符集,因此它适用于它理解的那些字符.但是我仍然不知道如何让 'utf-8' 在这里工作.

UPD3:哦,我发现这是一个已知的 Python 错误.我想会发生什么是 Python 将 cmd 编码作为 'cp65001 复制到 sys.stdin.encoding 并尝试将其应用于所有输入.由于它无法理解 'cp65001',它会在任何包含非 ascii 字符的输入上崩溃.

解决方案

以下是如何在不改变 encodingsaliases.py 的情况下将 cp65001 别名为 UTF-8:>

导入编解码器codecs.register(lambda name: codecs.lookup('utf-8') if name == 'cp65001' else None)

(恕我直言,不要在 http://bugs.python.org/issue6058#msg97731 .即使微软的编解码器有一些小错误,它也是一样的.)

这里是一些代码(为 Tahoe-LAFS,tahoe-lafs.org 编写),它使控制台输出不管chcp 代码页,并且还读取 Unicode命令行参数.感谢 Michael Kaplan 提出此解决方案背后的想法.如果 stdout 或 stderr 被重定向,它将输出 UTF-8.如果你想要一个字节顺序标记,你需要明确地写出来.

导入系统如果 sys.platform == "win32":导入编解码器从 ctypes 导入 WINFUNCTYPE、windll、POINTER、byref、c_int从 ctypes.wintypes 导入 BOOL、HANDLE、DWORD、LPWSTR、LPCWSTR、LPVOIDoriginal_stderr = sys.stderr# 如果这段代码出现任何异常,我们可能会尝试在 stderr 上打印它,# 如果 stderr 被定向到我们的包装器,这会导致令人沮丧的调试.# 所以要小心捕捉错误并将它们报告给 original_stderr,# 这样我们至少可以看到它们.def _complain(消息):打印 >>original_stderr, 消息 if isinstance(message, str) else repr(message)# 解决<http://bugs.python.org/issue6058>.codecs.register(lambda name: codecs.lookup('utf-8') if name == 'cp65001' else None)# 使 Unicode 控制台输出独立于当前代码页工作.# 这也修复了 <http://bugs.python.org/issue1602>.# 归功于 Michael Kaplan <http://www.siao2.com/2010/04/07/9989346.aspx># 和 TZOmegaTZIOY# <http://stackoverflow.com/questions/878972/windows-cmd-encoding-change-causes-python-crash/1432462#1432462>.尝试:# <http://msdn.microsoft.com/en-us/library/ms683231(VS.85).aspx>#HANDLE WINAPI GetStdHandle(DWORD nStdHandle);# 返回 INVALID_HANDLE_VALUE、NULL 或有效句柄## <http://msdn.microsoft.com/en-us/library/aa364960(VS.85).aspx># DWORD WINAPI GetFileType(DWORD hFile);## <http://msdn.microsoft.com/en-us/library/ms683167(VS.85).aspx># BOOL WINAPI GetConsoleMode(HANDLE hConsole, LPDWORD lpMode);GetStdHandle = WINFUNCTYPE(HANDLE, DWORD)(("GetStdHandle", windll.kernel32))STD_OUTPUT_HANDLE = DWORD(-11)STD_ERROR_HANDLE = DWORD(-12)GetFileType = WINFUNCTYPE(DWORD, DWORD)(("GetFileType",windll.kernel32))FILE_TYPE_CHAR = 0x0002FILE_TYPE_REMOTE = 0x8000GetConsoleMode = WINFUNCTYPE(BOOL, HANDLE, POINTER(DWORD))(("GetConsoleMode",windll.kernel32))INVALID_HANDLE_VALUE = DWORD(-1).值def not_a_console(句柄):如果 handle == INVALID_HANDLE_VALUE 或 handle 为 None:返回真返回 ((GetFileType(handle) & ~FILE_TYPE_REMOTE) != FILE_TYPE_CHAR或 GetConsoleMode(handle, byref(DWORD())) == 0)old_stdout_fileno = 无old_stderr_fileno = 无如果 hasattr(sys.stdout, 'fileno'):old_stdout_fileno = sys.stdout.fileno()如果 hasattr(sys.stderr, 'fileno'):old_stderr_fileno = sys.stderr.fileno()STDOUT_FILENO = 1STDERR_FILENO = 2real_stdout = (old_stdout_fileno == STDOUT_FILENO)real_stderr = (old_stderr_fileno == STDERR_FILENO)如果 real_stdout:hStdout = GetStdHandle(STD_OUTPUT_HANDLE)如果 not_a_console(hStdout):real_stdout = 假如果 real_stderr:hStderr = GetStdHandle(STD_ERROR_HANDLE)如果 not_a_console(hStderr):real_stderr = 假如果 real_stdout 或 real_stderr:# BOOL WINAPI WriteConsoleW(HANDLE hOutput, LPWSTR lpBuffer, DWORD nChars,# LPDWORD lpCharsWritten, LPVOID lpReserved);WriteConsoleW = WINFUNCTYPE(BOOL, HANDLE, LPWSTR, DWORD, POINTER(DWORD), LPVOID)(("WriteConsoleW",windll.kernel32))类 UnicodeOutput:def __init__(self, hConsole, stream, fileno, name):self._hConsole = hConsoleself._stream = 流self._fileno = 文件号self.closed = 假self.softspace = Falseself.mode = 'w'self.encoding = 'utf-8'self.name = 姓名self.flush()def isatty(自我):返回错误定义关闭(自我):# 不要真的关闭手柄,那只会引起问题self.closed = 真def fileno(self):返回 self._fileno定义刷新(自我):如果 self._hConsole 是 None:尝试:self._stream.flush()除了作为 e 的例外:_complain("%s.flush: %r from %r" % (self.name, e, self._stream))增加定义写(自我,文本):尝试:如果 self._hConsole 是 None:如果是实例(文本,Unicode):text = text.encode('utf-8')self._stream.write(text)别的:如果不是 isinstance(text, unicode):text = str(text).decode('utf-8')剩余 = len(文本)剩下的时候:n = 双字 (0)# 有一个短于记录的限制# 传递给 WriteConsoleW 的字符串长度(参见# <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1232>.retval = WriteConsoleW(self._hConsole, text, min(remaining, 10000), byref(n), None)如果 retval == 0 或 n.value == 0:raise IOError("WriteConsoleW 返回 %r, n.value = %r" % (retval, n.value))剩余 -= n.value如果没有剩余:休息文本 = 文本[n.value:]除了作为 e 的例外:_complain("%s.write: %r" % (self.name, e))增加def writelines(self,lines):尝试:对于线中线:self.write(行)除了作为 e 的例外:_complain("%s.writelines: %r" % (self.name, e))增加如果 real_stdout:sys.stdout = UnicodeOutput(hStdout, None, STDOUT_FILENO, '')别的:sys.stdout = UnicodeOutput(None, sys.stdout, old_stdout_fileno, '')如果 real_stderr:sys.stderr = UnicodeOutput(hStderr, None, STDERR_FILENO, '<Unicode console stderr>')别的:sys.stderr = UnicodeOutput(None, sys.stderr, old_stderr_fileno, '<Unicode 重定向的 stderr>')除了作为 e 的例外:_complain("在修复 sys.stdout 和 sys.stderr 时出现异常 %r" % (e,))# 在此过程中,让我们解开命令行参数:# 这适用于<http://bugs.python.org/issue2128>.GetCommandLineW = WINFUNCTYPE(LPWSTR)(("GetCommandLineW",windll.kernel32))CommandLineToArgvW = WINFUNCTYPE(POINTER(LPWSTR), LPCWSTR, POINTER(c_int))(("CommandLineToArgvW",windll.shell32))argc = c_int(0)argv_unicode = CommandLineToArgvW(GetCommandLineW(), byref(argc))argv = [argv_unicode[i].encode('utf-8') for i in xrange(0, argc.value)]如果不是 hasattr(sys, 'frozen'):# 如果这是由 py2exe 或 bbfreeze 生成的可执行文件,那么它将# 已被直接调用.否则, unicode_argv[0] 是 Python# 解释器,所以跳过那个.argv = argv[1:]# 还要跳过 Python 解释器的选项参数.而 len(argv) >0:arg = argv[0]如果不是 arg.startswith(u"-") 或 arg == u"-":休息argv = argv[1:]如果 arg == u'-m':# sys.argv[0] 应该是模块源的绝对路径,# 但是不要紧休息如果 arg == u'-c':argv[0] = u'-c'休息# 如果你喜欢:sys.argv = argv

最后, 可以满足 ΤΖΩΤΖΙΟΥ 使用 DejaVu Sans Mono 的愿望,我认为这是一种出色的控制台字体.

您可以在 '命令窗口中字体可用的必要条件' Microsoft KB

但基本上,在 Vista(可能还有 Win7)上:

  • HKEY_LOCAL_MACHINE_SOFTWAREMicrosoftWindows NTCurrentVersionConsoleTrueTypeFont下,设置"0""DejaVu Sans Mono"
  • 对于 HKEY_CURRENT_USERConsole 下的每个子项,将 "FaceName" 设置为 "DejaVu Sans Mono".

在 XP 上,检查线程 'Changing Command Prompt fonts?'在 LockerGnome 论坛中.

First I change Windows CMD encoding to utf-8 and run Python interpreter:

chcp 65001
python

Then I try to print a unicode sting inside it and when i do this Python crashes in a peculiar way (I just get a cmd prompt in the same window).

>>> import sys
>>> print u'ëèæîð'.encode(sys.stdin.encoding)

Any ideas why it happens and how to make it work?

UPD: sys.stdin.encoding returns 'cp65001'

UPD2: It just came to me that the issue might be connected with the fact that utf-8 uses multi-byte character set (kcwu made a good point on that). I tried running the whole example with 'windows-1250' and got 'ëeaî?'. Windows-1250 uses single-character set so it worked for those characters it understands. However I still have no idea how to make 'utf-8' work here.

UPD3: Oh, I found out it is a known Python bug. I guess what happens is that Python copies the cmd encoding as 'cp65001 to sys.stdin.encoding and tries to apply it to all the input. Since it fails to understand 'cp65001' it crashes on any input that contains non-ascii characters.

解决方案

Here's how to alias cp65001 to UTF-8 without changing encodingsaliases.py:

import codecs
codecs.register(lambda name: codecs.lookup('utf-8') if name == 'cp65001' else None)

(IMHO, don't pay any attention to the silliness about cp65001 not being identical to UTF-8 at http://bugs.python.org/issue6058#msg97731 . It's intended to be the same, even if Microsoft's codec has some minor bugs.)

Here is some code (written for Tahoe-LAFS, tahoe-lafs.org) that makes console output work regardless of the chcp code page, and also reads Unicode command-line arguments. Credit to Michael Kaplan for the idea behind this solution. If stdout or stderr are redirected, it will output UTF-8. If you want a Byte Order Mark, you'll need to write it explicitly.

[Edit: This version uses WriteConsoleW instead of the _O_U8TEXT flag in the MSVC runtime library, which is buggy. WriteConsoleW is also buggy relative to the MS documentation, but less so.]

import sys
if sys.platform == "win32":
    import codecs
    from ctypes import WINFUNCTYPE, windll, POINTER, byref, c_int
    from ctypes.wintypes import BOOL, HANDLE, DWORD, LPWSTR, LPCWSTR, LPVOID

    original_stderr = sys.stderr

    # If any exception occurs in this code, we'll probably try to print it on stderr,
    # which makes for frustrating debugging if stderr is directed to our wrapper.
    # So be paranoid about catching errors and reporting them to original_stderr,
    # so that we can at least see them.
    def _complain(message):
        print >>original_stderr, message if isinstance(message, str) else repr(message)

    # Work around <http://bugs.python.org/issue6058>.
    codecs.register(lambda name: codecs.lookup('utf-8') if name == 'cp65001' else None)

    # Make Unicode console output work independently of the current code page.
    # This also fixes <http://bugs.python.org/issue1602>.
    # Credit to Michael Kaplan <http://www.siao2.com/2010/04/07/9989346.aspx>
    # and TZOmegaTZIOY
    # <http://stackoverflow.com/questions/878972/windows-cmd-encoding-change-causes-python-crash/1432462#1432462>.
    try:
        # <http://msdn.microsoft.com/en-us/library/ms683231(VS.85).aspx>
        # HANDLE WINAPI GetStdHandle(DWORD nStdHandle);
        # returns INVALID_HANDLE_VALUE, NULL, or a valid handle
        #
        # <http://msdn.microsoft.com/en-us/library/aa364960(VS.85).aspx>
        # DWORD WINAPI GetFileType(DWORD hFile);
        #
        # <http://msdn.microsoft.com/en-us/library/ms683167(VS.85).aspx>
        # BOOL WINAPI GetConsoleMode(HANDLE hConsole, LPDWORD lpMode);

        GetStdHandle = WINFUNCTYPE(HANDLE, DWORD)(("GetStdHandle", windll.kernel32))
        STD_OUTPUT_HANDLE = DWORD(-11)
        STD_ERROR_HANDLE = DWORD(-12)
        GetFileType = WINFUNCTYPE(DWORD, DWORD)(("GetFileType", windll.kernel32))
        FILE_TYPE_CHAR = 0x0002
        FILE_TYPE_REMOTE = 0x8000
        GetConsoleMode = WINFUNCTYPE(BOOL, HANDLE, POINTER(DWORD))(("GetConsoleMode", windll.kernel32))
        INVALID_HANDLE_VALUE = DWORD(-1).value

        def not_a_console(handle):
            if handle == INVALID_HANDLE_VALUE or handle is None:
                return True
            return ((GetFileType(handle) & ~FILE_TYPE_REMOTE) != FILE_TYPE_CHAR
                    or GetConsoleMode(handle, byref(DWORD())) == 0)

        old_stdout_fileno = None
        old_stderr_fileno = None
        if hasattr(sys.stdout, 'fileno'):
            old_stdout_fileno = sys.stdout.fileno()
        if hasattr(sys.stderr, 'fileno'):
            old_stderr_fileno = sys.stderr.fileno()

        STDOUT_FILENO = 1
        STDERR_FILENO = 2
        real_stdout = (old_stdout_fileno == STDOUT_FILENO)
        real_stderr = (old_stderr_fileno == STDERR_FILENO)

        if real_stdout:
            hStdout = GetStdHandle(STD_OUTPUT_HANDLE)
            if not_a_console(hStdout):
                real_stdout = False

        if real_stderr:
            hStderr = GetStdHandle(STD_ERROR_HANDLE)
            if not_a_console(hStderr):
                real_stderr = False

        if real_stdout or real_stderr:
            # BOOL WINAPI WriteConsoleW(HANDLE hOutput, LPWSTR lpBuffer, DWORD nChars,
            #                           LPDWORD lpCharsWritten, LPVOID lpReserved);

            WriteConsoleW = WINFUNCTYPE(BOOL, HANDLE, LPWSTR, DWORD, POINTER(DWORD), LPVOID)(("WriteConsoleW", windll.kernel32))

            class UnicodeOutput:
                def __init__(self, hConsole, stream, fileno, name):
                    self._hConsole = hConsole
                    self._stream = stream
                    self._fileno = fileno
                    self.closed = False
                    self.softspace = False
                    self.mode = 'w'
                    self.encoding = 'utf-8'
                    self.name = name
                    self.flush()

                def isatty(self):
                    return False

                def close(self):
                    # don't really close the handle, that would only cause problems
                    self.closed = True

                def fileno(self):
                    return self._fileno

                def flush(self):
                    if self._hConsole is None:
                        try:
                            self._stream.flush()
                        except Exception as e:
                            _complain("%s.flush: %r from %r" % (self.name, e, self._stream))
                            raise

                def write(self, text):
                    try:
                        if self._hConsole is None:
                            if isinstance(text, unicode):
                                text = text.encode('utf-8')
                            self._stream.write(text)
                        else:
                            if not isinstance(text, unicode):
                                text = str(text).decode('utf-8')
                            remaining = len(text)
                            while remaining:
                                n = DWORD(0)
                                # There is a shorter-than-documented limitation on the
                                # length of the string passed to WriteConsoleW (see
                                # <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1232>.
                                retval = WriteConsoleW(self._hConsole, text, min(remaining, 10000), byref(n), None)
                                if retval == 0 or n.value == 0:
                                    raise IOError("WriteConsoleW returned %r, n.value = %r" % (retval, n.value))
                                remaining -= n.value
                                if not remaining:
                                    break
                                text = text[n.value:]
                    except Exception as e:
                        _complain("%s.write: %r" % (self.name, e))
                        raise

                def writelines(self, lines):
                    try:
                        for line in lines:
                            self.write(line)
                    except Exception as e:
                        _complain("%s.writelines: %r" % (self.name, e))
                        raise

            if real_stdout:
                sys.stdout = UnicodeOutput(hStdout, None, STDOUT_FILENO, '<Unicode console stdout>')
            else:
                sys.stdout = UnicodeOutput(None, sys.stdout, old_stdout_fileno, '<Unicode redirected stdout>')

            if real_stderr:
                sys.stderr = UnicodeOutput(hStderr, None, STDERR_FILENO, '<Unicode console stderr>')
            else:
                sys.stderr = UnicodeOutput(None, sys.stderr, old_stderr_fileno, '<Unicode redirected stderr>')
    except Exception as e:
        _complain("exception %r while fixing up sys.stdout and sys.stderr" % (e,))


    # While we're at it, let's unmangle the command-line arguments:

    # This works around <http://bugs.python.org/issue2128>.
    GetCommandLineW = WINFUNCTYPE(LPWSTR)(("GetCommandLineW", windll.kernel32))
    CommandLineToArgvW = WINFUNCTYPE(POINTER(LPWSTR), LPCWSTR, POINTER(c_int))(("CommandLineToArgvW", windll.shell32))

    argc = c_int(0)
    argv_unicode = CommandLineToArgvW(GetCommandLineW(), byref(argc))

    argv = [argv_unicode[i].encode('utf-8') for i in xrange(0, argc.value)]

    if not hasattr(sys, 'frozen'):
        # If this is an executable produced by py2exe or bbfreeze, then it will
        # have been invoked directly. Otherwise, unicode_argv[0] is the Python
        # interpreter, so skip that.
        argv = argv[1:]

        # Also skip option arguments to the Python interpreter.
        while len(argv) > 0:
            arg = argv[0]
            if not arg.startswith(u"-") or arg == u"-":
                break
            argv = argv[1:]
            if arg == u'-m':
                # sys.argv[0] should really be the absolute path of the module source,
                # but never mind
                break
            if arg == u'-c':
                argv[0] = u'-c'
                break

    # if you like:
    sys.argv = argv

Finally, it is possible to grant ΤΖΩΤΖΙΟΥ's wish to use DejaVu Sans Mono, which I agree is an excellent font, for the console.

You can find information on the font requirements and how to add new fonts for the windows console in the 'Necessary criteria for fonts to be available in a command window' Microsoft KB

But basically, on Vista (probably also Win7):

  • under HKEY_LOCAL_MACHINE_SOFTWAREMicrosoftWindows NTCurrentVersionConsoleTrueTypeFont, set "0" to "DejaVu Sans Mono";
  • for each of the subkeys under HKEY_CURRENT_USERConsole, set "FaceName" to "DejaVu Sans Mono".

On XP, check the thread 'Changing Command Prompt fonts?' in LockerGnome forums.

这篇关于Windows cmd 编码更改导致 Python 崩溃的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆