Python3 Qt Unicode文件名问题 [英] Python3 Qt unicode file name problems

查看:114
本文介绍了Python3 Qt Unicode文件名问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

类似于

QDir和QDirIterator会忽略具有非ASCII文件名的文件

UnicodeEncodeError:"latin-1"编解码器无法对字符进行编码

关于上面的第二个链接,我在下面添加了test0().我的理解是utf-8是我要寻找的解决方案,但是a尝试对文件名进行编码失败.

With regard to the second link above, I added test0() below. My understanding was that utf-8 was the solution I was searching for, but alas trying to encode the filename fails.

def test0():
    print("test0...using unicode literal")
    name = u"123c\udcb4.wav"
    test("test0b",  name)

    n = name.encode('utf-8') 
    print(n)
    n = QtCore.QFile.decodeName(n)
    print(n)

# From http://docs.python.org/release/3.0.1/howto/unicode.html
# This will indeed overwrite the correct file!
#    f = open(name, 'w')
#    f.write('blah\n')
#    f.close()

Test0结果...

Test0 results...

test0...using unicode literal
test0b QFile.exists 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' False
test0b QFileInfo.exists 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' False
test0b os.path.exists 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' True
test0b os.path.isfile 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' True

Traceback (most recent call last):
  File "unicode.py", line 157, in <module>
    test0()
  File "unicode.py", line 42, in test0
    n = name.encode('utf-8') 
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed

编辑

进一步阅读 http://tools.ietf.org/html/rfc3629 我说"UTF-8的定义禁止在 U + D800和U + DFFF".因此,如果uft-8不允许使用这些字符.您应该如何处理这样命名的文件?Python可以为它们创建并测试它们的存在.因此,这为我指出了我的Qt api使用情况或Qt api本身有问题吗?!

Further reading from http://tools.ietf.org/html/rfc3629 tells me that "The definition of UTF-8 prohibits encoding character numbers between U+D800 and U+DFFF". So if uft-8 doesn't allow these characters. How are you supposed to deal with a file that is so named? Python can create and test existence for them. So this points me at an issue with my Qt api usage or the Qt api itself?!

我正在竭尽全力地正确处理Python3中的unicode文件名.最终,我正在开发一个基于Phonon的音乐播放器.我试图尽可能地将问题隔离开来.从下面的代码中,您将看到我尝试了尽可能多的替代方法.我最初的反应是这里有错误....也许是我的...也许在一个或多个库中.任何帮助将不胜感激!

I am struggling to wrap my head around proper handling of unicode file name in Python3. Ultimately, I'm working on a Phonon based music player. I've tried to isolate the problem(s) from that as much as possible. From the code below you will see that I've tried as many alternatives as I can find. My initial response is that there are bugs here....maybe mine...maybe in one or more libraries. Any help would be much appreciated!

我有一个包含3个Unicode文件名123 [abc] U.wav的目录.前两个文件已正确处理...大部分...第三个123c只是错误.

I have a directory with 3 unicode file names 123[abc]U.wav. The first 2 files are handled properly...mostly...the third one 123c is just wrong.

from PyQt4 import QtGui,  QtCore
import sys,  os

def test(_name,  _file):
#    print(_name,  repr(_file))
    f = QtCore.QFile(_file)
#    f = QtCore.QFile(QtCore.QFile.decodeName(test))
    exists = f.exists()
    try:
        print(_name,  "QFile.exists",  f.fileName(),  exists)
    except UnicodeEncodeError as e:
        print(e,  repr(_file),  exists)
    fileInfo = QtCore.QFileInfo(_file)
    exists = fileInfo.exists()
    try:
        print(_name,  "QFileInfo.exists",  fileInfo.fileName(),  exists)
    except UnicodeEncodeError as e:
        print(e,  repr(_file),  exists)
    exists = os.path.exists(_file)
    try:
        print(_name,  "os.path.exists",  _file,  exists)
    except UnicodeEncodeError as e:
        print(e,  repr(_file),  exists)
    exists = os.path.isfile(_file)
    try:
        print(_name,  "os.path.isfile",  _file,  exists)
    except UnicodeEncodeError as e:
        print(e,  repr(_file),  exists)
    print()

def test1():
    args = QtGui.QApplication.arguments()
    print("test1...using QtGui.QApplication.arguments()")
    test("test1",  args[1])

def test2():
    print("test2...using sys.argv")
    test("test2",  sys.argv[1])

def test3():
    print("test3...QtGui.QFileDialog.getOpenFileName()")
    name = QtGui.QFileDialog.getOpenFileName()
    test("test3",  name)

def test4():
    print("test4...QtCore.QDir().entryInfoList()")
    p = os.path.abspath(__file__)
    p,  _ = os.path.split(p)
    d = QtCore.QDir(p)
    for inf in d.entryInfoList(QtCore.QDir.AllEntries|QtCore.QDir.NoDotAndDotDot|QtCore.QDir.System):
        print("test4",  inf.fileName())
#        if str(inf.fileName()).startswith("123c"):
        if u"123c\ufffd.wav" == inf.fileName():
#        if u"123c\udcb4.wav" == inf.fileName(): # This check fails..even tho that is what is reported in error messages for test2
            test("test4a",  inf.fileName())
            test("test4b",  inf.absoluteFilePath())

def test5():
    print("test5...os.listdir()")
    p = os.path.abspath(__file__)
    p,  _ = os.path.split(p)
    dirList = os.listdir(p)
    for file in dirList:
        fullfile = os.path.join(p, file)
        try:
            print("test5",  file)
        except UnicodeEncodeError as e:
            print(e)
        print("test5",  repr(fullfile))
#        if u"123c\ufffd.wav" == file: # This check fails..even tho it worked in test4
        if u"123c\udcb4.wav" == file:
            test("test5a",  file)
            test("test5b",  fullfile)
        print()

def test6():
    print("test6...Phonon and QtGui.QFileDialog.getOpenFileName()")
    from PyQt4.phonon import Phonon

    class Window(QtGui.QDialog):
        def __init__(self):
            QtGui.QDialog.__init__(self, None)
            self.mediaObject = Phonon.MediaObject(self)
            self.audioOutput = Phonon.AudioOutput(Phonon.MusicCategory, self)
            Phonon.createPath(self.mediaObject, self.audioOutput)
            self.mediaObject.stateChanged.connect(self.handleStateChanged)

            name = QtGui.QFileDialog.getOpenFileName()# works with python3..not for 123c
#            name = QtGui.QApplication.arguments()[1] # works with python2..but not python3...not for 123c
#            name = sys.argv[1] # works with python3..but not python2...not for 123c

#            p = os.path.abspath(__file__)
#            p,  _ = os.path.split(p)
#            print(p)
#            name = os.path.join(p, str(name))

            self.mediaObject.setCurrentSource(Phonon.MediaSource(name))

            self.mediaObject.play()

        def handleStateChanged(self, newstate, oldstate):
            if newstate == Phonon.PlayingState:
                source = self.mediaObject.currentSource().fileName()
                print('test6 playing: :', source)
            elif newstate == Phonon.StoppedState:
                source = self.mediaObject.currentSource().fileName()
                print('test6 stopped: :', source)
            elif newstate == Phonon.ErrorState:
                source = self.mediaObject.currentSource().fileName()
                print('test6 ERROR: could not play:', source)
    win = Window()
    win.resize(200, 100)
#    win.show()
    win.exec_()

def timerTick():
    QtGui.QApplication.exit()

if __name__ == '__main__':

    app = QtGui.QApplication(sys.argv)
    app.setApplicationName('unicode_test')

    test1()
    test2()
    test3()
    test4()
    test5()
    test6()
    timer = QtCore.QTimer()
    timer.timeout.connect(timerTick)
    timer.start(1)
    sys.exit(app.exec_())

使用123a的测试结果...

Test results with 123a...

python3 unicode.py 123a�.wav 
test1...using QtGui.QApplication.arguments()
test1 QFile.exists unknown False
test1 QFileInfo.exists unknown False
test1 os.path.exists unknown False
test1 os.path.isfile unknown False

test2...using sys.argv
test2 QFile.exists 123a�.wav True
test2 QFileInfo.exists 123a�.wav True
test2 os.path.exists 123a�.wav True
test2 os.path.isfile 123a�.wav True

test3...QtGui.QFileDialog.getOpenFileName()
test3 QFile.exists /home/mememe/Desktop/test/unicode/123a�.wav True
test3 QFileInfo.exists 123a�.wav True
test3 os.path.exists /home/mememe/Desktop/test/unicode/123a�.wav True
test3 os.path.isfile /home/mememe/Desktop/test/unicode/123a�.wav True

test4...QtCore.QDir().entryInfoList()
test4 123a�.wav
test4 123bÆ.wav
test4 123c�.wav
test4a QFile.exists 123c�.wav False
test4a QFileInfo.exists 123c�.wav False
test4a os.path.exists 123c�.wav False
test4a os.path.isfile 123c�.wav False

test4b QFile.exists /home/mememe/Desktop/test/unicode/123c�.wav False
test4b QFileInfo.exists 123c�.wav False
test4b os.path.exists /home/mememe/Desktop/test/unicode/123c�.wav False
test4b os.path.isfile /home/mememe/Desktop/test/unicode/123c�.wav False

test4 unicode.py
test5...os.listdir()
test5 unicode.py
test5 '/home/mememe/Desktop/test/unicode/unicode.py'

test5 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed
test5 '/home/mememe/Desktop/test/unicode/123c\udcb4.wav'
test5a QFile.exists 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' False
test5a QFileInfo.exists 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' False
test5a os.path.exists 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' True
test5a os.path.isfile 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' True

test5b QFile.exists 'utf-8' codec can't encode character '\udcb4' in position 38: surrogates not allowed '/home/mememe/Desktop/test/unicode/123c\udcb4.wav' False
test5b QFileInfo.exists 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '/home/mememe/Desktop/test/unicode/123c\udcb4.wav' False
test5b os.path.exists 'utf-8' codec can't encode character '\udcb4' in position 38: surrogates not allowed '/home/mememe/Desktop/test/unicode/123c\udcb4.wav' True
test5b os.path.isfile 'utf-8' codec can't encode character '\udcb4' in position 38: surrogates not allowed '/home/mememe/Desktop/test/unicode/123c\udcb4.wav' True


test5 123bÆ.wav
test5 '/home/mememe/Desktop/test/unicode/123bÆ.wav'

test5 123a�.wav
test5 '/home/mememe/Desktop/test/unicode/123a�.wav'

test6...Phonon and QtGui.QFileDialog.getOpenFileName()
test6 stopped: : /home/mememe/Desktop/test/unicode/123a�.wav
test6 playing: : /home/mememe/Desktop/test/unicode/123a�.wav
test6 stopped: : /home/mememe/Desktop/test/unicode/123a�.wav

使用123b的测试结果...

Test results with 123b...

python3 unicode.py 123bÆ.wav 
test1...using QtGui.QApplication.arguments()
test1 QFile.exists 123b.wav False
test1 QFileInfo.exists 123b.wav False
test1 os.path.exists 123b.wav False
test1 os.path.isfile 123b.wav False

test2...using sys.argv
test2 QFile.exists 123bÆ.wav True
test2 QFileInfo.exists 123bÆ.wav True
test2 os.path.exists 123bÆ.wav True
test2 os.path.isfile 123bÆ.wav True

test3...QtGui.QFileDialog.getOpenFileName()
test3 QFile.exists /home/mememe/Desktop/test/unicode/123bÆ.wav True
test3 QFileInfo.exists 123bÆ.wav True
test3 os.path.exists /home/mememe/Desktop/test/unicode/123bÆ.wav True
test3 os.path.isfile /home/mememe/Desktop/test/unicode/123bÆ.wav True

test4...QtCore.QDir().entryInfoList()
test4 123a�.wav
test4 123bÆ.wav
test4 123c�.wav
test4a QFile.exists 123c�.wav False
test4a QFileInfo.exists 123c�.wav False
test4a os.path.exists 123c�.wav False
test4a os.path.isfile 123c�.wav False

test4b QFile.exists /home/mememe/Desktop/test/unicode/123c�.wav False
test4b QFileInfo.exists 123c�.wav False
test4b os.path.exists /home/mememe/Desktop/test/unicode/123c�.wav False
test4b os.path.isfile /home/mememe/Desktop/test/unicode/123c�.wav False

test4 unicode.py
test5...os.listdir()
test5 unicode.py
test5 '/home/mememe/Desktop/test/unicode/unicode.py'

test5 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed
test5 '/home/mememe/Desktop/test/unicode/123c\udcb4.wav'
test5a QFile.exists 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' False
test5a QFileInfo.exists 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' False
test5a os.path.exists 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' True
test5a os.path.isfile 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' True

test5b QFile.exists 'utf-8' codec can't encode character '\udcb4' in position 38: surrogates not allowed '/home/mememe/Desktop/test/unicode/123c\udcb4.wav' False
test5b QFileInfo.exists 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '/home/mememe/Desktop/test/unicode/123c\udcb4.wav' False
test5b os.path.exists 'utf-8' codec can't encode character '\udcb4' in position 38: surrogates not allowed '/home/mememe/Desktop/test/unicode/123c\udcb4.wav' True
test5b os.path.isfile 'utf-8' codec can't encode character '\udcb4' in position 38: surrogates not allowed '/home/mememe/Desktop/test/unicode/123c\udcb4.wav' True


test5 123bÆ.wav
test5 '/home/mememe/Desktop/test/unicode/123bÆ.wav'

test5 123a�.wav
test5 '/home/mememe/Desktop/test/unicode/123a�.wav'

test6...Phonon and QtGui.QFileDialog.getOpenFileName()
test6 stopped: : /home/mememe/Desktop/test/unicode/123bÆ.wav
test6 playing: : /home/mememe/Desktop/test/unicode/123bÆ.wav
test6 stopped: : /home/mememe/Desktop/test/unicode/123bÆ.wav

使用123c的测试结果...

Test results with 123c...

python3 unicode.py 123c�.wav 
test1...using QtGui.QApplication.arguments()
test1 QFile.exists unknown False
test1 QFileInfo.exists unknown False
test1 os.path.exists unknown False
test1 os.path.isfile unknown False

test2...using sys.argv
test2 QFile.exists 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' False
test2 QFileInfo.exists 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' False
test2 os.path.exists 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' True
test2 os.path.isfile 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' True

test3...QtGui.QFileDialog.getOpenFileName()
test3 QFile.exists /home/mememe/Desktop/test/unicode/123c�.wav False
test3 QFileInfo.exists 123c�.wav False
test3 os.path.exists /home/mememe/Desktop/test/unicode/123c�.wav False
test3 os.path.isfile /home/mememe/Desktop/test/unicode/123c�.wav False

test4...QtCore.QDir().entryInfoList()
test4 123a�.wav
test4 123bÆ.wav
test4 123c�.wav
test4a QFile.exists 123c�.wav False
test4a QFileInfo.exists 123c�.wav False
test4a os.path.exists 123c�.wav False
test4a os.path.isfile 123c�.wav False

test4b QFile.exists /home/mememe/Desktop/test/unicode/123c�.wav False
test4b QFileInfo.exists 123c�.wav False
test4b os.path.exists /home/mememe/Desktop/test/unicode/123c�.wav False
test4b os.path.isfile /home/mememe/Desktop/test/unicode/123c�.wav False

test4 unicode.py
test5...os.listdir()
test5 unicode.py
test5 '/home/mememe/Desktop/test/unicode/unicode.py'

test5 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed
test5 '/home/mememe/Desktop/test/unicode/123c\udcb4.wav'
test5a QFile.exists 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' False
test5a QFileInfo.exists 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' False
test5a os.path.exists 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' True
test5a os.path.isfile 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' True

test5b QFile.exists 'utf-8' codec can't encode character '\udcb4' in position 38: surrogates not allowed '/home/mememe/Desktop/test/unicode/123c\udcb4.wav' False
test5b QFileInfo.exists 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '/home/mememe/Desktop/test/unicode/123c\udcb4.wav' False
test5b os.path.exists 'utf-8' codec can't encode character '\udcb4' in position 38: surrogates not allowed '/home/mememe/Desktop/test/unicode/123c\udcb4.wav' True
test5b os.path.isfile 'utf-8' codec can't encode character '\udcb4' in position 38: surrogates not allowed '/home/mememe/Desktop/test/unicode/123c\udcb4.wav' True


test5 123bÆ.wav
test5 '/home/mememe/Desktop/test/unicode/123bÆ.wav'

test5 123a�.wav
test5 '/home/mememe/Desktop/test/unicode/123a�.wav'

test6...Phonon and QtGui.QFileDialog.getOpenFileName()
test6 stopped: : /home/mememe/Desktop/test/unicode/123c�.wav

有关测试结果的有趣提示...

Interesting things to note about the test results...

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆