Windows cmd编码更改会导致Python崩溃 [英] Windows cmd encoding change causes Python crash

查看:762
本文介绍了Windows cmd编码更改会导致Python崩溃的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先,我将Windows CMD编码更改为utf-8并运行Python解释器:

  chcp 65001 
python

然后我尝试打印一个unicode sting里面,当我做这个Python崩溃以一种奇怪的方式我只是得到一个cmd提示在同一个窗口)。

 >>> import sys 
>>> printu'ëèæîð'.encode(sys.stdin.encoding)

任何想法为什么会发生,

UPD sys.stdin.encoding 返回'cp65001'



UPD2 :刚刚来到我, utf-8使用多字节字符集(kcwu是一个好点)。我试过运行整个例子与'windows-1250'和'ëeaî?'。 Windows-1250使用单字符集,因此它适用于它理解的那些字符。



UPD3 :哦,我发现这是一个< a href =http://bugs.python.org/issue1602 =nofollow noreferrer>已知的Python错误。我猜想发生了什么是Python复制cmd编码为'cp65001到sys.stdin.encoding,并尝试将其应用于所有的输入。

解决方案

下面是如何使用别名

code> cp65001 到UTF-8,而不更改 encodings\aliases.py

  import codecs 
codecs.register(lambda name:codecs.lookup('utf-8')if name =='cp65001'else None)

(IMHO,不要注意 cp65001 http://bugs.python.org/issue6058#msg97731 不同的UTF-8相同。 / a>。)。



这里是一些代码(为Tahoe-LAFS,tahoe-lafs .org),使控制台输出无需 chcp 代码页,并且还读取Unicode命令行参数。归功于 Michael Kaplan 此解决方案背后的想法。如果stdout或stderr被重定向,它将输出UTF-8。




except Exception as e:
_complain(%s.write:%r%(self.name,e))
raise

def writelines(self,lines):
try:
行中的行:
self.write(行)
except Exception as e:
_complain(%s.writelines:%r%(self.name,e))
raise

如果real_stdout:
sys.stdout = UnicodeOutput(hStdout, None,STDOUT_FILENO,'< Unicode console stdout>')
else:
sys.stdout = UnicodeOutput(None,sys.stdout,old_stdout_fileno,'< Unicode redirected stdout>')

如果real_stderr:
sys.stderr = UnicodeOutput(hStderr,None,STDERR_FILENO,'< Unicode console stderr>')
else:
sys.stderr =除非异常为e:
_complain(修复sys.stdout和sys.stderr时的异常%r%(e,))
$ sys.stdr )


#在我们处理它的时候,让我们解开命令行参数:

#这可以在< http://bugs.python .org / issue2128> ;.
GetCommandLineW = WINFUNCTYPE(LPWSTR)((GetCommandLineW,windll.kernel32))
CommandLineToArgvW = WINFUNCTYPE(POINTER(LPWSTR),LPCWSTR,POINTER(c_int))((CommandLineToArgvW,windll.shell32 ))

argc = c_int(0)
argv_unicode = CommandLineToArgvW(GetCommandLineW(),byref(argc))

argv = [argv_unicode [i] .encode ('utf-8')for x in xrange(0,argc.value)]

如果没有hasattr(sys,'frozen'):
#如果这是一个可执行文件py2exe或bbfreeze,那么它将直接调用
#。否则,unicode_argv [0]是Python
#解释器,因此跳过。
argv = argv [1:]

#也跳过Python解释器的选项参数。
while len(argv)> 0:
arg = argv [0]
如果不是arg.startswith(u - )或arg == u - :
break
argv = argv [1 :]
如果arg == u'-m':
#sys.argv [0]应该真的是模块源的绝对路径,
#但不介意
break
if arg == u'-c':
argv [0] = u'-c'
break

如果你喜欢的话:
sys.argv = argv

最后,可以授予ΤΖΩΤΖΙΟΥ希望使用DejaVu Sans Mono,我同意这是一个优秀的字体,对于控制台。



您可以找到有关字体要求的信息,以及如何添加新的字体Windows控制台中'在命令窗口中可用的字体的必要条件'Microsoft KB



但基本上,在Vista(也可能是Win7):




  • HKEY_LOCAL_MACHINE_SOFTWARE\Microsoft\Windows NT\CurrentVersion\Console\TrueTypeFont ,设置0 HKEY_CURRENT_USER\Console 下的每个子项的DejaVu Sans Mono;

  • < c $ c>,将FaceName设为DejaVu Sans Mono


在XP上,检查线程更改命令提示字体?在LockerGnome论坛


First I change Windows CMD encoding to utf-8 and run Python interpreter:

chcp 65001
python

Then I try to print a unicode sting inside it and when i do this Python crashes in a peculiar way (I just get a cmd prompt in the same window).

>>> import sys
>>> print u'ëèæîð'.encode(sys.stdin.encoding)

Any ideas why it happens and how to make it work?

UPD: sys.stdin.encoding returns 'cp65001'

UPD2: It just came to me that the issue might be connected with the fact that utf-8 uses multi-byte character set (kcwu made a good point on that). I tried running the whole example with 'windows-1250' and got 'ëeaî?'. Windows-1250 uses single-character set so it worked for those characters it understands. However I still have no idea how to make 'utf-8' work here.

UPD3: Oh, I found out it is a known Python bug. I guess what happens is that Python copies the cmd encoding as 'cp65001 to sys.stdin.encoding and tries to apply it to all the input. Since it fails to understand 'cp65001' it crashes on any input that contains non-ascii characters.

解决方案

Here's how to alias cp65001 to UTF-8 without changing encodings\aliases.py:

import codecs
codecs.register(lambda name: codecs.lookup('utf-8') if name == 'cp65001' else None)

(IMHO, don't pay any attention to the silliness about cp65001 not being identical to UTF-8 at http://bugs.python.org/issue6058#msg97731 . It's intended to be the same, even if Microsoft's codec has some minor bugs.)

Here is some code (written for Tahoe-LAFS, tahoe-lafs.org) that makes console output work regardless of the chcp code page, and also reads Unicode command-line arguments. Credit to Michael Kaplan for the idea behind this solution. If stdout or stderr are redirected, it will output UTF-8. If you want a Byte Order Mark, you'll need to write it explicitly.

[Edit: This version uses WriteConsoleW instead of the _O_U8TEXT flag in the MSVC runtime library, which is buggy. WriteConsoleW is also buggy relative to the MS documentation, but less so.]

import sys
if sys.platform == "win32":
    import codecs
    from ctypes import WINFUNCTYPE, windll, POINTER, byref, c_int
    from ctypes.wintypes import BOOL, HANDLE, DWORD, LPWSTR, LPCWSTR, LPVOID

    original_stderr = sys.stderr

    # If any exception occurs in this code, we'll probably try to print it on stderr,
    # which makes for frustrating debugging if stderr is directed to our wrapper.
    # So be paranoid about catching errors and reporting them to original_stderr,
    # so that we can at least see them.
    def _complain(message):
        print >>original_stderr, message if isinstance(message, str) else repr(message)

    # Work around <http://bugs.python.org/issue6058>.
    codecs.register(lambda name: codecs.lookup('utf-8') if name == 'cp65001' else None)

    # Make Unicode console output work independently of the current code page.
    # This also fixes <http://bugs.python.org/issue1602>.
    # Credit to Michael Kaplan <http://www.siao2.com/2010/04/07/9989346.aspx>
    # and TZOmegaTZIOY
    # <http://stackoverflow.com/questions/878972/windows-cmd-encoding-change-causes-python-crash/1432462#1432462>.
    try:
        # <http://msdn.microsoft.com/en-us/library/ms683231(VS.85).aspx>
        # HANDLE WINAPI GetStdHandle(DWORD nStdHandle);
        # returns INVALID_HANDLE_VALUE, NULL, or a valid handle
        #
        # <http://msdn.microsoft.com/en-us/library/aa364960(VS.85).aspx>
        # DWORD WINAPI GetFileType(DWORD hFile);
        #
        # <http://msdn.microsoft.com/en-us/library/ms683167(VS.85).aspx>
        # BOOL WINAPI GetConsoleMode(HANDLE hConsole, LPDWORD lpMode);

        GetStdHandle = WINFUNCTYPE(HANDLE, DWORD)(("GetStdHandle", windll.kernel32))
        STD_OUTPUT_HANDLE = DWORD(-11)
        STD_ERROR_HANDLE = DWORD(-12)
        GetFileType = WINFUNCTYPE(DWORD, DWORD)(("GetFileType", windll.kernel32))
        FILE_TYPE_CHAR = 0x0002
        FILE_TYPE_REMOTE = 0x8000
        GetConsoleMode = WINFUNCTYPE(BOOL, HANDLE, POINTER(DWORD))(("GetConsoleMode", windll.kernel32))
        INVALID_HANDLE_VALUE = DWORD(-1).value

        def not_a_console(handle):
            if handle == INVALID_HANDLE_VALUE or handle is None:
                return True
            return ((GetFileType(handle) & ~FILE_TYPE_REMOTE) != FILE_TYPE_CHAR
                    or GetConsoleMode(handle, byref(DWORD())) == 0)

        old_stdout_fileno = None
        old_stderr_fileno = None
        if hasattr(sys.stdout, 'fileno'):
            old_stdout_fileno = sys.stdout.fileno()
        if hasattr(sys.stderr, 'fileno'):
            old_stderr_fileno = sys.stderr.fileno()

        STDOUT_FILENO = 1
        STDERR_FILENO = 2
        real_stdout = (old_stdout_fileno == STDOUT_FILENO)
        real_stderr = (old_stderr_fileno == STDERR_FILENO)

        if real_stdout:
            hStdout = GetStdHandle(STD_OUTPUT_HANDLE)
            if not_a_console(hStdout):
                real_stdout = False

        if real_stderr:
            hStderr = GetStdHandle(STD_ERROR_HANDLE)
            if not_a_console(hStderr):
                real_stderr = False

        if real_stdout or real_stderr:
            # BOOL WINAPI WriteConsoleW(HANDLE hOutput, LPWSTR lpBuffer, DWORD nChars,
            #                           LPDWORD lpCharsWritten, LPVOID lpReserved);

            WriteConsoleW = WINFUNCTYPE(BOOL, HANDLE, LPWSTR, DWORD, POINTER(DWORD), LPVOID)(("WriteConsoleW", windll.kernel32))

            class UnicodeOutput:
                def __init__(self, hConsole, stream, fileno, name):
                    self._hConsole = hConsole
                    self._stream = stream
                    self._fileno = fileno
                    self.closed = False
                    self.softspace = False
                    self.mode = 'w'
                    self.encoding = 'utf-8'
                    self.name = name
                    self.flush()

                def isatty(self):
                    return False

                def close(self):
                    # don't really close the handle, that would only cause problems
                    self.closed = True

                def fileno(self):
                    return self._fileno

                def flush(self):
                    if self._hConsole is None:
                        try:
                            self._stream.flush()
                        except Exception as e:
                            _complain("%s.flush: %r from %r" % (self.name, e, self._stream))
                            raise

                def write(self, text):
                    try:
                        if self._hConsole is None:
                            if isinstance(text, unicode):
                                text = text.encode('utf-8')
                            self._stream.write(text)
                        else:
                            if not isinstance(text, unicode):
                                text = str(text).decode('utf-8')
                            remaining = len(text)
                            while remaining:
                                n = DWORD(0)
                                # There is a shorter-than-documented limitation on the
                                # length of the string passed to WriteConsoleW (see
                                # <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1232>.
                                retval = WriteConsoleW(self._hConsole, text, min(remaining, 10000), byref(n), None)
                                if retval == 0 or n.value == 0:
                                    raise IOError("WriteConsoleW returned %r, n.value = %r" % (retval, n.value))
                                remaining -= n.value
                                if not remaining:
                                    break
                                text = text[n.value:]
                    except Exception as e:
                        _complain("%s.write: %r" % (self.name, e))
                        raise

                def writelines(self, lines):
                    try:
                        for line in lines:
                            self.write(line)
                    except Exception as e:
                        _complain("%s.writelines: %r" % (self.name, e))
                        raise

            if real_stdout:
                sys.stdout = UnicodeOutput(hStdout, None, STDOUT_FILENO, '<Unicode console stdout>')
            else:
                sys.stdout = UnicodeOutput(None, sys.stdout, old_stdout_fileno, '<Unicode redirected stdout>')

            if real_stderr:
                sys.stderr = UnicodeOutput(hStderr, None, STDERR_FILENO, '<Unicode console stderr>')
            else:
                sys.stderr = UnicodeOutput(None, sys.stderr, old_stderr_fileno, '<Unicode redirected stderr>')
    except Exception as e:
        _complain("exception %r while fixing up sys.stdout and sys.stderr" % (e,))


    # While we're at it, let's unmangle the command-line arguments:

    # This works around <http://bugs.python.org/issue2128>.
    GetCommandLineW = WINFUNCTYPE(LPWSTR)(("GetCommandLineW", windll.kernel32))
    CommandLineToArgvW = WINFUNCTYPE(POINTER(LPWSTR), LPCWSTR, POINTER(c_int))(("CommandLineToArgvW", windll.shell32))

    argc = c_int(0)
    argv_unicode = CommandLineToArgvW(GetCommandLineW(), byref(argc))

    argv = [argv_unicode[i].encode('utf-8') for i in xrange(0, argc.value)]

    if not hasattr(sys, 'frozen'):
        # If this is an executable produced by py2exe or bbfreeze, then it will
        # have been invoked directly. Otherwise, unicode_argv[0] is the Python
        # interpreter, so skip that.
        argv = argv[1:]

        # Also skip option arguments to the Python interpreter.
        while len(argv) > 0:
            arg = argv[0]
            if not arg.startswith(u"-") or arg == u"-":
                break
            argv = argv[1:]
            if arg == u'-m':
                # sys.argv[0] should really be the absolute path of the module source,
                # but never mind
                break
            if arg == u'-c':
                argv[0] = u'-c'
                break

    # if you like:
    sys.argv = argv

Finally, it is possible to grant ΤΖΩΤΖΙΟΥ's wish to use DejaVu Sans Mono, which I agree is an excellent font, for the console.

You can find information on the font requirements and how to add new fonts for the windows console in the 'Necessary criteria for fonts to be available in a command window' Microsoft KB

But basically, on Vista (probably also Win7):

  • under HKEY_LOCAL_MACHINE_SOFTWARE\Microsoft\Windows NT\CurrentVersion\Console\TrueTypeFont, set "0" to "DejaVu Sans Mono";
  • for each of the subkeys under HKEY_CURRENT_USER\Console, set "FaceName" to "DejaVu Sans Mono".

On XP, check the thread 'Changing Command Prompt fonts?' in LockerGnome forums.

这篇关于Windows cmd编码更改会导致Python崩溃的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆