Python os.walk和日文文件名崩溃 [英] Python os.walk and japanese filename crash

查看:286
本文介绍了Python os.walk和日文文件名崩溃的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


可能重复:






我有一个文件名为01 - ナナナン块.txt的文件夹。



我在与文件相同的文件夹中的交互式提示符处打开python并尝试走过文件夹Python 3.1.2(r312:79149,2010年3月21日,00:41:52)[MSC v.1500 32位(Intel)]在win32
上输入help,copyright,credits或license以获取更多信息。
>>> import os
>>> ('。'):
... print(x)
...
Traceback(最近一次调用的最后一个):
文件< stdin>,第2行,在< module>
文件C:\dev\Python31\lib\encodings\cp850.py,第19行,编码
返回codecs.charmap_encode(input,self.errors,encoding_map)[ 0]
UnicodeEncodeError:'charmap'编解码器无法对17-21位置的字符进行编码:字符映射到< undefined>

显然我使用的编码不能处理日文字符。精细。但是,据我所知,Python 3.1本来就是unicode,所以我不知道该怎么做。任何人有任何想法?

解决方案

看起来好像所有的答案都来自Unix的人谁假设Windows控制台就像一个Unix终端,它不是。

问题是,您不能使用正常的底层文件I / O函数将Unicode输出写入Windows控制台。需要使用Windows API WriteConsole 。 Python可能应该透明地做到这一点,但它不是。

如果将输出重定向到一个文件,还有一个问题:Windows文本文件历史上是ANSI代码页,而不是Unicode。你现在可以安全地将UTF-8编写成Windows中的文本文件,但是Python默认情况下不这样做。我认为它应该做这些事情,但是这里有一些代码来实现它。如果你不想要,你不必担心细节;只需调用ConsoleFile.wrap_standard_handles()。您需要安装PyWin才能访问必要的API。

pre $ p $进口操作系统,io,win32api,win32console,pywintypes

def change_file_encoding(f,encoding):

TextIOWrapper缺少一种更改文件编码的方法,所以我们必须通过创建一个
新的一个


errors = f.errors
line_buffering = f.line_buffering
#f.newlines与TextIOWrapper的newline参数不一样。
#newlines = f.newlines

buf = f.detach()

#TextIOWrapper在Windows上默认换行为\ r \ n,即使底层的
#文件对象已经为我们做了。我们需要明确地说\\\

#确保我们不输出\r\r\\\
;这与内部函数
#create_stdio相同。
返回io.TextIOWrapper(buf,encoding,errors,\ n,line_buffering)

$ b $ class ConsoleFile:
FileNotConsole(Exception):pass

def __init __(self,handle):
handle = win32api.GetStdHandle(handle)
self.screen = win32console.PyConsoleScreenBufferType(handle)
try:
self.screen.GetConsoleMode()
,除了pywintypes.error作为e:
raise ConsoleFile.FileNotConsole

def write(self,s):
self.screen .WriteConsole
$ b $ def close(self):pass
def flush(self):pass
def isatty(self):return True

@staticmethod
def wrap_standard_handles():
sys.stdout.flush()
try:
#似乎没有绑定_get_osfhandle。
sys.stdout = ConsoleFile(win32api.STD_OUTPUT_HANDLE)
除了ConsoleFile.FileNotConsole:
sys.stdout = change_file_encoding(sys.stdout,utf-8)

sys.stderr.flush()
try:
sys.stderr = ConsoleFile(win32api.STD_ERROR_HANDLE)
除ConsoleFile.FileNotConsole外:
sys.stderr = change_file_encoding(sys.stderr ,utf-8)

ConsoleFile.wrap_standard_handles()

print(English汉字Кириллица)

这有点棘手:如果stdout或stderr是控制台,我们需要使用WriteConsole输出;但如果它不是(例如,foo.py>文件),这是行不通的,我们需要改变文件的编码为UTF-8。



在任何情况下相反将无法正常工作。你不能用WriteConsole输出到一个普通的文件(它实际上不是一个字节API,而是一个UTF-16文件; PyWin隐藏了这个细节),你不能把UTF-8编写到Windows控制台。



另外,它确实应该使用_get_osfhandle来获得stdout和stderr的句柄,而不是假设它们被分配给标准句柄,但是这个API似乎没有PyWin绑定。


Possible Duplicate:
Python, Unicode, and the Windows console

I have a folder with a filename "01 - ナナナン塊.txt"

I open python at the interactive prompt in the same folder as the file and attempt to walk the folder hierachy:

Python 3.1.2 (r312:79149, Mar 21 2010, 00:41:52) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> for x in os.walk('.'):
...     print(x)
...
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "C:\dev\Python31\lib\encodings\cp850.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 17-21: character maps to <undefined>

Clearly the encoding I'm using isn't able to deal with Japanese characters. Fine. But Python 3.1 is meant to be unicode all the way down, as I understand it, so I'm at a loss as to what I'm meant to do with this. Anyone have any ideas?

解决方案

It seems like all answers so far are from Unix people who assume the Windows console is like a Unix terminal, which it is not.

The problem is that you can't write Unicode output to the Windows console using the normal underlying file I/O functions. The Windows API WriteConsole needs to be used. Python should probably be doing this transparently, but it isn't.

There's a different problem if you redirect the output to a file: Windows text files are historically in the ANSI codepage, not Unicode. You can fairly safely write UTF-8 to text files in Windows these days, but Python doesn't do that by default.

I think it should do these things, but here's some code to make it happen. You don't have to worry about the details if you don't want to; just call ConsoleFile.wrap_standard_handles(). You do need PyWin installed to get access to the necessary APIs.

import os, sys, io, win32api, win32console, pywintypes

def change_file_encoding(f, encoding):
    """
    TextIOWrapper is missing a way to change the file encoding, so we have to
    do it by creating a new one.
    """

    errors = f.errors
    line_buffering = f.line_buffering
    # f.newlines is not the same as the newline parameter to TextIOWrapper.
    # newlines = f.newlines

    buf = f.detach()

    # TextIOWrapper defaults newline to \r\n on Windows, even though the underlying
    # file object is already doing that for us.  We need to explicitly say "\n" to
    # make sure we don't output \r\r\n; this is the same as the internal function
    # create_stdio.
    return io.TextIOWrapper(buf, encoding, errors, "\n", line_buffering)


class ConsoleFile:
    class FileNotConsole(Exception): pass

    def __init__(self, handle):
        handle = win32api.GetStdHandle(handle)
        self.screen = win32console.PyConsoleScreenBufferType(handle)
        try:
            self.screen.GetConsoleMode()
        except pywintypes.error as e:
            raise ConsoleFile.FileNotConsole

    def write(self, s):
        self.screen.WriteConsole(s)

    def close(self): pass
    def flush(self): pass
    def isatty(self): return True

    @staticmethod
    def wrap_standard_handles():
        sys.stdout.flush()
        try:
            # There seems to be no binding for _get_osfhandle.
            sys.stdout = ConsoleFile(win32api.STD_OUTPUT_HANDLE)
        except ConsoleFile.FileNotConsole:
            sys.stdout = change_file_encoding(sys.stdout, "utf-8")

        sys.stderr.flush()
        try:
            sys.stderr = ConsoleFile(win32api.STD_ERROR_HANDLE)
        except ConsoleFile.FileNotConsole:
            sys.stderr = change_file_encoding(sys.stderr, "utf-8")

ConsoleFile.wrap_standard_handles()

print("English 漢字 Кири́ллица")

This is a little tricky: if stdout or stderr is the console, we need to output with WriteConsole; but if it's not (eg. foo.py > file), that's not going to work, and we need to change the file's encoding to UTF-8 instead.

The opposite in either case will not work. You can't output to a regular file with WriteConsole (it's not actually a byte API, but a UTF-16 one; PyWin hides this detail), and you can't write UTF-8 to a Windows console.

Also, it really should be using _get_osfhandle to get the handle to stdout and stderr, rather than assuming they're assigned to the standard handles, but that API doesn't seem to have any PyWin binding.

这篇关于Python os.walk和日文文件名崩溃的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆