Unicode和Zipfile问题 [英] Unicode and Zipfile problems
问题描述
AAAAAAAARG我讨厌python处理unicode的方式。这是一个很好的
问题,你们都可以享受:说你有一个变量就是unicode
directory = u" c:\ temp"
它的unicode不是因为你想要它,而是因为它的例如
从_winreg读取它返回unicode。
你做一个os.listdir(目录)。请注意,返回的所有文件名都是
现在是unicode。 (更改介绍我相信2.3)。
您将文件名添加到zipfile.ZipFile对象。有时,你会得到这个例外:
回溯(最近一次调用最后一次):
文件collect_trace_info.py ;,第65行,在CollectTraceInfo
z.write(路径名)
文件C:\Python23 \lib \ zipfile.py,第416行,写作
self.fp.write(zinfo.FileHeader())
文件C:\Python23 \lib \ zipfile.py,第170行,在FileHeader中
返回标题+ self.filename + self.extra
UnicodeDecodeError:''ascii''编解码器无法解码字节0x88的位置
12:
序数不在范围内(128)
重新获得镇定后,您会找到原因:标题
是struct.pack()生成的字节字符串。然而,self.filename是一个
unicode字符串,因为os.listdir将它作为unicode返回。如果
" header"生成高于0x7F的任何东西 - 可以但不需要
发生,具体取决于您等待异常的文件类型
自己 - 有时候。大。 (如果
文件名包含字符> 0x7F,则可能会出现同样的情况)。如果你没有发生问题
有str输入文件名,因为那时没有后退转换
。
在调用z.write()字节编码之前,有一个简单的修复方法。这里
是一个示例代码:
import os,zipfile,win32api
def test(目录):
z =
zipfile.ZipFile(os.path.join(目录," temp.zip")," w",zipfile.ZIP_DEFLATED)
os.listdir(目录)中的文件名
:
z.write(os.path.join(目录,文件名))
z.close()
if __name__ ==" __ main __":
test(unicode(win32api.GetSystemDirectory()))
注意:它可能适用于您的系统,具体取决于文件类型。
要修复它,请使用
z.write(os.path .join(目录,文件名).encode(" latin-1"))
现在,任何人都可以写一个
i-don''t-care-if-my-app-can-display-klingon-characters原始字节
编码,它不会抛出任何断言,也不在乎
或者字符是否在0x7F范围内?如果我不能将我的批量文件移植到swaheli,那就没问题了。
AAAAAAAARG I hate the way python handles unicode. Here is a nice
problem for y''all to enjoy: say you have a variable thats unicode
directory = u"c:\temp"
Its unicode not because you want it to, but because its for example
read from _winreg which returns unicode.
You do an os.listdir(directory). Note that all filenames returned are
now unicode. (Change introduced I believe in 2.3).
You add the filenames to a zipfile.ZipFile object. Sometimes, you will
get this exception:
Traceback (most recent call last):
File "collect_trace_info.py", line 65, in CollectTraceInfo
z.write(pathname)
File "C:\Python23\lib\zipfile.py", line 416, in write
self.fp.write(zinfo.FileHeader())
File "C:\Python23\lib\zipfile.py", line 170, in FileHeader
return header + self.filename + self.extra
UnicodeDecodeError: ''ascii'' codec can''t decode byte 0x88 in position
12:
ordinal not in range(128)
After you have regained your composure, you find the reason: "header"
is a struct.pack() generated byte string. self.filename is however a
unicode string because it is returned by os.listdir as unicode. If
"header" generates anything above 0x7F - which can but need not
happen, depending on the type of file you have an exception waiting
for yourself - sometimes. Great. (The same will probably occur if
filename contains chars > 0x7F). The problem does not occur if you
have "str" type filenames, because then no backandforth conversion is
being made.
There is a simple fix, before calling z.write() byte-encode it. Here
is a sample code:
import os, zipfile, win32api
def test(directory):
z =
zipfile.ZipFile(os.path.join(directory,"temp.zip") ,"w",zipfile.ZIP_DEFLATED)
for filename in os.listdir(directory):
z.write(os.path.join(directory, filename))
z.close()
if __name__ == "__main__":
test(unicode(win32api.GetSystemDirectory()))
Note: It might work on your system, depending on the types of files.
To fix it, use
z.write(os.path.join(directory, filename).encode("latin-1"))
But to my thinking, this is a bug in zipfile.py, really.
Now, could anybody please just write a
"i-don''t-care-if-my-app-can-display-klingon-characters" raw byte
encoding which doesn''t throw any assertions and doesn''t care whether
or not the characters are in the 0x7F range? Its ok if I cannot port
my batchscripts to swaheli, really.
推荐答案
这是我的dontcare - 编解码器,故意丑陋:
导入编解码器
def E(i,e =''''):
l = lambda c:chr(min(ord(c),255))
r ="" .join(map(l,i))
return(r,len(r))
def D(i,e =''''):
l = lambda c:unichr(ord( c))
r = u"" .join(map(l,i))
return(r,len(r))
>
class c(codecs.Codec):
def编码(self,i,e =''''):
返回E(i,e )
def解码(self,i,e =''''):
返回D(i,e)
class w(c,codecs.StreamWriter):传递
class r(c,codecs.StreamReader):传递
getregentry = lambda:(E, D,r,w)
要安装,请另存为python23 \lib \ encodings \ left.datcare.py并启用它。
in site.py 。这是一个测试代码:
尝试:
打印unicode(chr(0xFF))
除了UnicodeDecodeError,e:
打印e
试试:
print unichr(12345)
除了UnicodeEncodeError,e:
打印e
Here is my "dontcare"-codec, ugly on purpose:
import codecs
def E(i,e=''''):
l=lambda c:chr(min(ord(c),255))
r="".join(map(l,i))
return (r,len(r))
def D(i,e=''''):
l=lambda c: unichr(ord(c))
r=u"".join(map(l,i))
return (r,len(r))
class c(codecs.Codec):
def encode(self, i,e=''''):
return E(i,e)
def decode(self, i,e=''''):
return D(i,e)
class w(c,codecs.StreamWriter): pass
class r(c,codecs.StreamReader): pass
getregentry = lambda: (E, D, r, w)
To install, save as python23\lib\encodings\dontcare.py and enable it
in site.py. Here is a testcode:
try:
print unicode(chr(0xFF))
except UnicodeDecodeError, e:
print e
try:
print unichr(12345)
except UnicodeEncodeError, e:
print e
" Gerson Kurz" < GE ********* @ t-online.de> schrieb im Newsbeitrag
新闻:3f ************* @ news.t-online.de ...
| AAAAAAAARG我讨厌python处理unicode的方式。这是一个很好的
|你们都喜欢的问题:说你有一个变量即unicode
|
| directory = u" c:\ temp"
|
|它的unicode不是因为你想要它,而是因为它的例如
|从_winreg读取,返回unicode。
|
|你做一个os.listdir(目录)。请注意,返回的所有文件名都是
|现在unicode。 (改变介绍我相信2.3)。
错误。
只有当type(目录)给你<输入''unicode''>
如果在执行os.listdir(目录)之前调用str(目录)
你(在大多数情况下)你甚至想要通知并且可以继续做你想做的事情
做
就好了 - 加上,这是好事 - 你可以忘记
那些你以后建议的黑客,有些人会考虑*邪恶*。
这样可以节省一些时间。
嘿,离开我的斯瓦希里语朋友你会独自一人! ;)
HTH,
Vincent Wehren
|
|您将文件名添加到zipfile.ZipFile对象。有时,你会
|得到这个例外:
|
| Traceback(最近一次电话会议):
|文件collect_trace_info.py,第65行,在CollectTraceInfo中
| z.write(路径名)
|文件C:\Python23 \lib \ zipfile.py,第416行,写作
| self.fp.write(zinfo.FileHeader())
|文件C:\Python23 \lib \ zipfile.py,第170行,在FileHeader中
| return header + self.filename + self.extra
| UnicodeDecodeError:''ascii''编解码器无法解码位于字节0x88的位置
| 12:
|序数不在范围内(128)
|
|重新获得镇定后,您会找到原因:标题
|是struct.pack()生成的字节字符串。然而,self.filename是
| unicode字符串,因为它由os.listdir作为unicode返回。如果
| "报头QUOT;生成0x7F以上的任何东西 - 可以但不一定需要
|发生,取决于您等待异常的文件类型
|对你自己 - 有时候。大。 (如果
|文件名包含字符> 0x7F),可能会发生同样的情况。如果你这个问题没有发生
|有str输入文件名,因为没有后退转换是
|正在制作。
|
|在调用z.write()字节编码之前,有一个简单的修复。这里
|是一个示例代码:
|
| import os,zipfile,win32api
|
| def测试(目录):
| z =
|
zipfile.ZipFile(os.path.join(目录," temp.zip")," w",zipfile.ZIP_DEFLATED)
|对于os.listdir(目录)中的文件名:
| z.write(os.path.join(目录,文件名))
| z.close()
|
| if __name__ ==" __ main __":
| test(unicode(win32api.GetSystemDirectory()))
|
|注意:它可能适用于您的系统,具体取决于文件类型。
|要解决它,请使用
|
| z.write(os.path.join(目录,文件名).encode(" latin-1"))
|
|但据我的想法,这是zipfile.py中的一个错误,真的。
|
|现在,任何人都可以写一个
| " I-申明,不要护理-IF-MY-APP-可以显示-克林贡字符"原始字节
|编码不会抛出任何断言,也不在乎
|或者字符是否在0x7F范围内?如果我不能移动它就可以了。
|我的批次写到swaheli,真的。
|
|
"Gerson Kurz" <ge*********@t-online.de> schrieb im Newsbeitrag
news:3f*************@news.t-online.de...
| AAAAAAAARG I hate the way python handles unicode. Here is a nice
| problem for y''all to enjoy: say you have a variable thats unicode
|
| directory = u"c:\temp"
|
| Its unicode not because you want it to, but because its for example
| read from _winreg which returns unicode.
|
| You do an os.listdir(directory). Note that all filenames returned are
| now unicode. (Change introduced I believe in 2.3).
Wrong.
That''s only true if type(directory) gives you <type ''unicode''>
If you call str(directory) before doing os.listdir(directory)
you (in most cases) want even notice and can continue doing what you want to
do
just fine - plus, and that''s the good part - you can forget about
those hacks you suggest later and which some would consider *evil*.
It''ll save yourself some time too.
Hey, and leave my Swahili friends alone will ya! ;)
HTH,
Vincent Wehren
|
| You add the filenames to a zipfile.ZipFile object. Sometimes, you will
| get this exception:
|
| Traceback (most recent call last):
| File "collect_trace_info.py", line 65, in CollectTraceInfo
| z.write(pathname)
| File "C:\Python23\lib\zipfile.py", line 416, in write
| self.fp.write(zinfo.FileHeader())
| File "C:\Python23\lib\zipfile.py", line 170, in FileHeader
| return header + self.filename + self.extra
| UnicodeDecodeError: ''ascii'' codec can''t decode byte 0x88 in position
| 12:
| ordinal not in range(128)
|
| After you have regained your composure, you find the reason: "header"
| is a struct.pack() generated byte string. self.filename is however a
| unicode string because it is returned by os.listdir as unicode. If
| "header" generates anything above 0x7F - which can but need not
| happen, depending on the type of file you have an exception waiting
| for yourself - sometimes. Great. (The same will probably occur if
| filename contains chars > 0x7F). The problem does not occur if you
| have "str" type filenames, because then no backandforth conversion is
| being made.
|
| There is a simple fix, before calling z.write() byte-encode it. Here
| is a sample code:
|
| import os, zipfile, win32api
|
| def test(directory):
| z =
|
zipfile.ZipFile(os.path.join(directory,"temp.zip") ,"w",zipfile.ZIP_DEFLATED)
| for filename in os.listdir(directory):
| z.write(os.path.join(directory, filename))
| z.close()
|
| if __name__ == "__main__":
| test(unicode(win32api.GetSystemDirectory()))
|
| Note: It might work on your system, depending on the types of files.
| To fix it, use
|
| z.write(os.path.join(directory, filename).encode("latin-1"))
|
| But to my thinking, this is a bug in zipfile.py, really.
|
| Now, could anybody please just write a
| "i-don''t-care-if-my-app-can-display-klingon-characters" raw byte
| encoding which doesn''t throw any assertions and doesn''t care whether
| or not the characters are in the 0x7F range? Its ok if I cannot port
| my batchscripts to swaheli, really.
|
|
" vincent wehren" < 6 ***** @ visualtrans.de> schrieb im Newsbeitrag
新闻:bo ********** @ news4.tilbu1.nb.home.nl ...
|
| Gerson Kurz < GE ********* @ t-online.de> schrieb im Newsbeitrag
|新闻:3f ************* @ news.t-online.de ...
| | AAAAAAAARG我讨厌python处理unicode的方式。这是一个很好的
| |你们都喜欢的问题:说你有一个变量就是unicode
| |
| | directory = u" c:\ temp"
| |
| |它的unicode不是因为你想要它,而是因为它的例如
| |从_winreg读取,返回unicode。
| |
| |你做一个os.listdir(目录)。请注意,返回的所有文件名都是
| |现在unicode。 (改变介绍我相信2.3)。
|
|错了。
|
|只有当type(目录)给你< type''unicode''>
|时才会这样。如果在执行os.listdir(目录)之前调用str(目录)
|你(在大多数情况下)甚至想要注意并且可以继续做你想要的事情
到
当我说在大多数情况下时,我的意思是所有那些目录
的字符都没有映射到ASCII
范围之外的单字节值的字符。在其他情况下你会去:
目录=
directory.encode(your_favorite_and_hoepfully_the_r ight_single_byte_legacy_en
coding_here)
问候,
Vincent
|做
|很好 - 加上,这是好的部分 - 你可以忘记
|你以后建议的那些黑客,有些人会考虑*邪恶*。
|它也可以节省一些时间。
|
|嘿,离开我的斯瓦希里语朋友吧! ;)
|
| HTH,
| Vincent Wehren
|
|
|
| |
| |您将文件名添加到zipfile.ZipFile对象。有时,你会
| |得到这个例外:
| |
| | Traceback(最近一次电话会议):
| |文件collect_trace_info.py,第65行,在CollectTraceInfo中
| | z.write(路径名)
| |文件C:\Python23 \lib \ zipfile.py,第416行,写作
| | self.fp.write(zinfo.FileHeader())
| |文件C:\Python23 \lib \ zipfile.py,第170行,在FileHeader中
| | return header + self.filename + self.extra
| | UnicodeDecodeError:''ascii''编解码器无法解码位于字节0x88的位置
| | 12:
| |序数不在范围内(128)
| |
| |重新获得镇定后,您会找到原因:标题
| |是struct.pack()生成的字节字符串。然而,self.filename是
| | unicode字符串,因为它由os.listdir作为unicode返回。如果
| | "报头QUOT;生成0x7F以上的任何东西 - 可以但不一定需要
| |发生,取决于您等待异常的文件类型
| |对你自己 - 有时候。大。 (如果
| | filename包含字符> 0x7F),可能会发生同样的情况。如果你这个问题没有发生
| |有str输入文件名,因为没有后退转换是
| |正在制作。
| |
| |在调用z.write()字节编码之前,有一个简单的修复。这里
| |是一个示例代码:
| |
| | import os,zipfile,win32api
| |
| | def测试(目录):
| | z =
| |
|
zipfile.ZipFile(os.path.join(目录," temp.zip")," w",zipfile.ZIP_DEFLATED)
| |对于os.listdir(目录)中的文件名:
| | z.write(os.path.join(目录,文件名))
| | z.close()
| |
| | if __name__ ==" __ main __":
| | test(unicode(win32api.GetSystemDirectory()))
| |
| |注意:它可能适用于您的系统,具体取决于文件类型。
| |要修复它,请使用
| |
| | z.write(os.path.join(目录,文件名).encode(" latin-1"))
| |
| |但据我的想法,这是zipfile.py中的一个错误,真的。
| |
| |现在,任何人都可以写一个
| | " I-申明,不要护理-IF-MY-APP-可以显示-克林贡字符"原始字节
| |编码不会抛出任何断言,也不在乎
| |或者字符是否在0x7F范围内?如果我不能移动它就可以了。
| |我的批量文件是swaheli,真的。
| |
| |
|
|
"vincent wehren" <vi*****@visualtrans.de> schrieb im Newsbeitrag
news:bo**********@news4.tilbu1.nb.home.nl...
|
| "Gerson Kurz" <ge*********@t-online.de> schrieb im Newsbeitrag
| news:3f*************@news.t-online.de...
| | AAAAAAAARG I hate the way python handles unicode. Here is a nice
| | problem for y''all to enjoy: say you have a variable thats unicode
| |
| | directory = u"c:\temp"
| |
| | Its unicode not because you want it to, but because its for example
| | read from _winreg which returns unicode.
| |
| | You do an os.listdir(directory). Note that all filenames returned are
| | now unicode. (Change introduced I believe in 2.3).
|
| Wrong.
|
| That''s only true if type(directory) gives you <type ''unicode''>
| If you call str(directory) before doing os.listdir(directory)
| you (in most cases) want even notice and can continue doing what you want
to
And when I say "in most cases", I mean all those cases where "directory"
doesn''t have characters that map to a single-byte value outside of the ASCII
range. In other cases you''ll just go :
directory =
directory.encode(your_favorite_and_hoepfully_the_r ight_single_byte_legacy_en
coding_here)
before calling os.listdir(directory)
Regards,
Vincent
| do
| just fine - plus, and that''s the good part - you can forget about
| those hacks you suggest later and which some would consider *evil*.
| It''ll save yourself some time too.
|
| Hey, and leave my Swahili friends alone will ya! ;)
|
| HTH,
| Vincent Wehren
|
|
|
| |
| | You add the filenames to a zipfile.ZipFile object. Sometimes, you will
| | get this exception:
| |
| | Traceback (most recent call last):
| | File "collect_trace_info.py", line 65, in CollectTraceInfo
| | z.write(pathname)
| | File "C:\Python23\lib\zipfile.py", line 416, in write
| | self.fp.write(zinfo.FileHeader())
| | File "C:\Python23\lib\zipfile.py", line 170, in FileHeader
| | return header + self.filename + self.extra
| | UnicodeDecodeError: ''ascii'' codec can''t decode byte 0x88 in position
| | 12:
| | ordinal not in range(128)
| |
| | After you have regained your composure, you find the reason: "header"
| | is a struct.pack() generated byte string. self.filename is however a
| | unicode string because it is returned by os.listdir as unicode. If
| | "header" generates anything above 0x7F - which can but need not
| | happen, depending on the type of file you have an exception waiting
| | for yourself - sometimes. Great. (The same will probably occur if
| | filename contains chars > 0x7F). The problem does not occur if you
| | have "str" type filenames, because then no backandforth conversion is
| | being made.
| |
| | There is a simple fix, before calling z.write() byte-encode it. Here
| | is a sample code:
| |
| | import os, zipfile, win32api
| |
| | def test(directory):
| | z =
| |
|
zipfile.ZipFile(os.path.join(directory,"temp.zip") ,"w",zipfile.ZIP_DEFLATED)
| | for filename in os.listdir(directory):
| | z.write(os.path.join(directory, filename))
| | z.close()
| |
| | if __name__ == "__main__":
| | test(unicode(win32api.GetSystemDirectory()))
| |
| | Note: It might work on your system, depending on the types of files.
| | To fix it, use
| |
| | z.write(os.path.join(directory, filename).encode("latin-1"))
| |
| | But to my thinking, this is a bug in zipfile.py, really.
| |
| | Now, could anybody please just write a
| | "i-don''t-care-if-my-app-can-display-klingon-characters" raw byte
| | encoding which doesn''t throw any assertions and doesn''t care whether
| | or not the characters are in the 0x7F range? Its ok if I cannot port
| | my batchscripts to swaheli, really.
| |
| |
|
|
这篇关于Unicode和Zipfile问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!