在Python中处理UTF文件名 [英] Handling UTF filenames in Python

查看:82
本文介绍了在Python中处理UTF文件名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经阅读了很多有关该主题的内容,包括似乎是该主题的权威指南: http://docs.python.org/howto/unicode.html

I've read quite a bit on the topic already, including what seems to be the definitive guide on this topic here: http://docs.python.org/howto/unicode.html

对于一个经验丰富的开发人员来说,该指南可能就足够了.但是,就我而言,我比开始时更加困惑,但仍然没有解决我的问题.

Perhaps for a more experienced developer, that guide may be enough. However, in my case, I'm more confused than when I started and still haven't resolved my issue.

我正在尝试使用os.walk()读取文件名并获取有关文件的某些信息(例如filesize),然后再将该信息写入文本文件.只要我不碰到任何文件名都用utf编码的文件,此方法就起作用.当它用utf编码的文件名命中文件时,我会收到如下错误:

I am trying to read filenames using os.walk() and to obtain certain information about the files (such as filesize) before writing that information to a text file. This works as long as I don't run into any files with filenames encoded in utf. When it hits a file with a utf encoded name I get an error like this one:

WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect: 'Documents\\??.txt'

在这种情况下,文件名为唽咿.txt.

In that case, the file was named 唽咿.txt.

到目前为止,这是我一直在尝试的方法:

Here is how I have been trying to do it so far:

for (root, dirs, files) in os.walk(dirpath):
        for filename in files:
            filepath = os.path.join(root, filename)
            filesize = os.stat(filepath).st_size
            file = open(filepath, 'rb')
            stuff = get_stuff(filesize, file)
            file.close()

在这种情况下,dirpath来自代码的较早部分,即"dirpath = raw_input()".

In case it matters, dirpath came from an earlier portion of code that amounts to 'dirpath = raw_input()'.

我尝试了各种操作,例如将文件路径行更改为:

I've tried various things such as changing the filepath line to:

filepath = unicode(os.path.join(unicode(root), unicode(filename)))

但是我没有尝试过.

这是我的两个问题:

  1. 如何获取将正确的文件名传递给os.stat()方法的方法,以便我可以从中获取正确的响应?

  1. How can I get it to pass the correct filename to the os.stat() method so that I can get a correct response from it?

我的脚本需要将一些文件名写入一个文本文件,稍后可能要读取该文件名.那时,它需要能够根据刚刚从文本文件中读取的内容来查找文件.如何将这样的文件名正确地写入文本文件,然后在以后正确读取呢?

My script needs to write some filenames into a text file that it may later want to read from. At that point it needs to be able to find the file based on what it just read from the text file. How do I write such filenames to a text file properly and then read from it properly later?

推荐答案

对于那些对完整解决方案感兴趣的人:

For those interested in the full solution:

dirpath = raw_input()

已更改为:

dirpath = raw_input().decode(sys.stdin.encoding)

这使得传递给os.walk()的参数使用unicode,导致返回的文件名也使用unicode.

That allowed for the argument being passed to os.walk() to be in unicode, causing the filenames it returned to also be in unicode.

要将这些内容写入文件或从文件写入(第二个问题),我使用了codecs.open()功能

To write these to or from a file (my second question) I used the codecs.open() functionality

这篇关于在Python中处理UTF文件名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆