Python无法打开路径中包含非英语字符的文件 [英] Python not able to open file with non-english characters in path
问题描述
我有一个具有以下路径的文件:D:/bar/クレイジー・ヒッツ!/foo.abc
I have a file with the following path : D:/bar/クレイジー・ヒッツ!/foo.abc
我正在解析XML文件中的路径,并将其以file://localhost/D:/bar/クレイジー・ヒッツ!/foo.abc
的形式存储在名为path
的变量中
然后,将执行以下操作:
I am parsing the path from a XML file and storing it in a variable called path
in the form of file://localhost/D:/bar/クレイジー・ヒッツ!/foo.abc
Then, the following operations are being done :
path=path.strip()
path=path[17:] #to remove the file://localhost/ part
path=urllib.url2pathname(path)
path=urllib.unquote(path)
错误是:
IOError: [Errno 2] No such file or directory: 'D:\\bar\\\xe3\x82\xaf\xe3\x83\xac\xe3\x82\xa4\xe3\x82\xb8\xe3\x83\xbc\xe3\x83\xbb\xe3\x83\x92\xe3\x83\x83\xe3\x83\x84\xef\xbc\x81\\foo.abc'
更新1:我在Windows 7上使用的是Python 2.7
Update 1 : I am using Python 2.7 on Windows 7
推荐答案
错误路径为:
'\xe3\x82\xaf\xe3\x83\xac\xe3\x82\xa4\xe3\x82\xb8\xe3\x83\xbc\xe3\x83\xbb\xe3\x83\x92\xe3\x83\x83\xe3\x83\x84\xef\xbc\x81'
我认为这是文件名的UTF8编码版本.
I think this is the UTF8 encoded version of your filename.
我已经在Windows7上创建了一个同名文件夹,并在其中放置了一个名为"abc.txt"的文件:
I've created a folder of the same name on Windows7 and placed a file called 'abc.txt' in it:
>>> a = '\xe3\x82\xaf\xe3\x83\xac\xe3\x82\xa4\xe3\x82\xb8\xe3\x83\xbc\xe3\x83\xbb\xe3\x83\x92\xe3\x83\x83\xe3\x83\x84\xef\xbc\x81'
>>> os.listdir('.')
['?????\xb7???!']
>>> os.listdir(u'.') # Pass unicode to have unicode returned to you
[u'\u30af\u30ec\u30a4\u30b8\u30fc\u30fb\u30d2\u30c3\u30c4\uff01']
>>>
>>> a.decode('utf8') # UTF8 decoding your string matches the listdir output
u'\u30af\u30ec\u30a4\u30b8\u30fc\u30fb\u30d2\u30c3\u30c4\uff01'
>>> os.listdir(a.decode('utf8'))
[u'abc.txt']
所以看来邓肯对path.decode('utf8')
的建议可以解决问题.
So it seems that Duncan's suggestion of path.decode('utf8')
does the trick.
更新
我无法为您进行测试,但是我建议您在执行.decode('utf8')
之前尝试检查路径是否包含非ascii.这有点hacky ...
I can't test this for you, but I suggest that you try checking whether the path contains non-ascii before doing the .decode('utf8')
. This is a bit hacky...
ASCII_TRANS = '_'*32 + ''.join([chr(x) for x in range(32,126)]) + '_'*130
path=path.strip()
path=path[17:] #to remove the file://localhost/ part
path=urllib.unquote(path)
if path.translate(ASCII_TRANS) != path: # Contains non-ascii
path = path.decode('utf8')
path=urllib.url2pathname(path)
这篇关于Python无法打开路径中包含非英语字符的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!