Unicode和Zipfile问题 [英] Unicode and Zipfile problems

查看:103
本文介绍了Unicode和Zipfile问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

AAAAAAAARG我讨厌python处理unicode的方式。这是一个很好的

问题,你们都可以享受:说你有一个变量就是unicode


directory = u" c:\ temp"


它的unicode不是因为你想要它,而是因为它的例如

从_winreg读取它返回unicode。


你做一个os.listdir(目录)。请注意,返回的所有文件名都是

现在是unicode。 (更改介绍我相信2.3)。


您将文件名添加到zipfile.ZipFile对象。有时,你会得到这个例外:


回溯(最近一次调用最后一次):

文件collect_trace_info.py ;,第65行,在CollectTraceInfo

z.write(路径名)

文件C:\Python23 \lib \ zipfile.py,第416行,写作

self.fp.write(zinfo.FileHeader())

文件C:\Python23 \lib \ zipfile.py,第170行,在FileHeader中

返回标题+ self.filename + self.extra

UnicodeDecodeError:''ascii''编解码器无法解码字节0x88的位置

12:

序数不在范围内(128)


重新获得镇定后,您会找到原因:标题

是struct.pack()生成的字节字符串。然而,self.filename是一个

unicode字符串,因为os.listdir将它作为unicode返回。如果

" header"生成高于0x7F的任何东西 - 可以但不需要

发生,具体取决于您等待异常的文件类型

自己 - 有时候。大。 (如果

文件名包含字符> 0x7F,则可能会出现同样的情况)。如果你没有发生问题

有str输入文件名,因为那时没有后退转换




在调用z.write()字节编码之前,有一个简单的修复方法。这里

是一个示例代码:


import os,zipfile,win32api

def test(目录):

z =

zipfile.ZipFile(os.path.join(目录," temp.zip")," w",zipfile.ZIP_DEFLATED)
os.listdir(目录)中的文件名


z.write(os.path.join(目录,文件名))

z.close()


if __name__ ==" __ main __":

test(unicode(win32api.GetSystemDirectory()))

注意:它可能适用于您的系统,具体取决于文件类型。

要修复它,请使用


z.write(os.path .join(目录,文件名).encode(" latin-1"))



现在,任何人都可以写一个

i-don''t-care-if-my-app-can-display-klingon-characters原始字节

编码,它不会抛出任何断言,也不在乎

或者字符是否在0x7F范围内?如果我不能将我的批量文件移植到swaheli,那就没问题了。

AAAAAAAARG I hate the way python handles unicode. Here is a nice
problem for y''all to enjoy: say you have a variable thats unicode

directory = u"c:\temp"

Its unicode not because you want it to, but because its for example
read from _winreg which returns unicode.

You do an os.listdir(directory). Note that all filenames returned are
now unicode. (Change introduced I believe in 2.3).

You add the filenames to a zipfile.ZipFile object. Sometimes, you will
get this exception:

Traceback (most recent call last):
File "collect_trace_info.py", line 65, in CollectTraceInfo
z.write(pathname)
File "C:\Python23\lib\zipfile.py", line 416, in write
self.fp.write(zinfo.FileHeader())
File "C:\Python23\lib\zipfile.py", line 170, in FileHeader
return header + self.filename + self.extra
UnicodeDecodeError: ''ascii'' codec can''t decode byte 0x88 in position
12:
ordinal not in range(128)

After you have regained your composure, you find the reason: "header"
is a struct.pack() generated byte string. self.filename is however a
unicode string because it is returned by os.listdir as unicode. If
"header" generates anything above 0x7F - which can but need not
happen, depending on the type of file you have an exception waiting
for yourself - sometimes. Great. (The same will probably occur if
filename contains chars > 0x7F). The problem does not occur if you
have "str" type filenames, because then no backandforth conversion is
being made.

There is a simple fix, before calling z.write() byte-encode it. Here
is a sample code:

import os, zipfile, win32api

def test(directory):
z =
zipfile.ZipFile(os.path.join(directory,"temp.zip") ,"w",zipfile.ZIP_DEFLATED)
for filename in os.listdir(directory):
z.write(os.path.join(directory, filename))
z.close()

if __name__ == "__main__":
test(unicode(win32api.GetSystemDirectory()))

Note: It might work on your system, depending on the types of files.
To fix it, use

z.write(os.path.join(directory, filename).encode("latin-1"))

But to my thinking, this is a bug in zipfile.py, really.

Now, could anybody please just write a
"i-don''t-care-if-my-app-can-display-klingon-characters" raw byte
encoding which doesn''t throw any assertions and doesn''t care whether
or not the characters are in the 0x7F range? Its ok if I cannot port
my batchscripts to swaheli, really.

推荐答案

这是我的dontcare - 编解码器,故意丑陋:


导入编解码器

def E(i,e =''''):

l = lambda c:chr(min(ord(c),255))

r ="" .join(map(l,i))

return(r,len(r))

def D(i,e =''''):

l = lambda c:unichr(ord( c))

r = u"" .join(map(l,i))

return(r,len(r))

>
class c(codecs.Codec):

def编码(self,i,e =''''):

返回E(i,e )

def解码(self,i,e =''''):

返回D(i,e)


class w(c,codecs.StreamWriter):传递

class r(c,codecs.StreamReader):传递


getregentry = lambda:(E, D,r,w)


要安装,请另存为python23 \lib \ encodings \ left.datcare.py并启用它。

in site.py 。这是一个测试代码:


尝试:

打印unicode(chr(0xFF))

除了UnicodeDecodeError,e:

打印e


试试:

print unichr(12345)

除了UnicodeEncodeError,e:

打印e

Here is my "dontcare"-codec, ugly on purpose:

import codecs

def E(i,e=''''):
l=lambda c:chr(min(ord(c),255))
r="".join(map(l,i))
return (r,len(r))

def D(i,e=''''):
l=lambda c: unichr(ord(c))
r=u"".join(map(l,i))
return (r,len(r))

class c(codecs.Codec):
def encode(self, i,e=''''):
return E(i,e)
def decode(self, i,e=''''):
return D(i,e)

class w(c,codecs.StreamWriter): pass
class r(c,codecs.StreamReader): pass

getregentry = lambda: (E, D, r, w)

To install, save as python23\lib\encodings\dontcare.py and enable it
in site.py. Here is a testcode:

try:
print unicode(chr(0xFF))
except UnicodeDecodeError, e:
print e

try:
print unichr(12345)
except UnicodeEncodeError, e:
print e




" Gerson Kurz" < GE ********* @ t-online.de> schrieb im Newsbeitrag

新闻:3f ************* @ news.t-online.de ...

| AAAAAAAARG我讨厌python处理unicode的方式。这是一个很好的

|你们都喜欢的问题:说你有一个变量即unicode

|

| directory = u" c:\ temp"

|

|它的unicode不是因为你想要它,而是因为它的例如

|从_winreg读取,返回unicode。

|

|你做一个os.listdir(目录)。请注意,返回的所有文件名都是

|现在unicode。 (改变介绍我相信2.3)。


错误。


只有当type(目录)给你<输入''unicode''>

如果在执行os.listdir(目录)之前调用str(目录)

你(在大多数情况下)你甚至想要通知并且可以继续做你想做的事情



就好了 - 加上,这是好事 - 你可以忘记

那些你以后建议的黑客,有些人会考虑*邪恶*。

这样可以节省一些时间。


嘿,离开我的斯瓦希里语朋友你会独自一人! ;)


HTH,

Vincent Wehren


|

|您将文件名添加到zipfile.ZipFile对象。有时,你会

|得到这个例外:

|

| Traceback(最近一次电话会议):

|文件collect_trace_info.py,第65行,在CollectTraceInfo中

| z.write(路径名)

|文件C:\Python23 \lib \ zipfile.py,第416行,写作

| self.fp.write(zinfo.FileHeader())

|文件C:\Python23 \lib \ zipfile.py,第170行,在FileHeader中

| return header + self.filename + self.extra

| UnicodeDecodeError:''ascii''编解码器无法解码位于字节0x88的位置

| 12:

|序数不在范围内(128)

|

|重新获得镇定后,您会找到原因:标题

|是struct.pack()生成的字节字符串。然而,self.filename是

| unicode字符串,因为它由os.listdir作为unicode返回。如果

| "报头QUOT;生成0x7F以上的任何东西 - 可以但不一定需要

|发生,取决于您等待异常的文件类型

|对你自己 - 有时候。大。 (如果

|文件名包含字符> 0x7F),可能会发生同样的情况。如果你这个问题没有发生

|有str输入文件名,因为没有后退转换是

|正在制作。

|

|在调用z.write()字节编码之前,有一个简单的修复。这里

|是一个示例代码:

|

| import os,zipfile,win32api

|

| def测试(目录):

| z =

|

zipfile.ZipFile(os.path.join(目录," temp.zip")," w",zipfile.ZIP_DEFLATED)

|对于os.listdir(目录)中的文件名:

| z.write(os.path.join(目录,文件名))

| z.close()

|

| if __name__ ==" __ main __":

| test(unicode(win32api.GetSystemDirectory()))

|

|注意:它可能适用于您的系统,具体取决于文件类型。

|要解决它,请使用

|

| z.write(os.path.join(目录,文件名).encode(" latin-1"))

|

|但据我的想法,这是zipfile.py中的一个错误,真的。

|

|现在,任何人都可以写一个

| " I-申明,不要护理-IF-MY-APP-可以显示-克林贡字符"原始字节

|编码不会抛出任何断言,也不在乎

|或者字符是否在0x7F范围内?如果我不能移动它就可以了。

|我的批次写到swaheli,真的。

|

|

"Gerson Kurz" <ge*********@t-online.de> schrieb im Newsbeitrag
news:3f*************@news.t-online.de...
| AAAAAAAARG I hate the way python handles unicode. Here is a nice
| problem for y''all to enjoy: say you have a variable thats unicode
|
| directory = u"c:\temp"
|
| Its unicode not because you want it to, but because its for example
| read from _winreg which returns unicode.
|
| You do an os.listdir(directory). Note that all filenames returned are
| now unicode. (Change introduced I believe in 2.3).

Wrong.

That''s only true if type(directory) gives you <type ''unicode''>
If you call str(directory) before doing os.listdir(directory)
you (in most cases) want even notice and can continue doing what you want to
do
just fine - plus, and that''s the good part - you can forget about
those hacks you suggest later and which some would consider *evil*.
It''ll save yourself some time too.

Hey, and leave my Swahili friends alone will ya! ;)

HTH,
Vincent Wehren

|
| You add the filenames to a zipfile.ZipFile object. Sometimes, you will
| get this exception:
|
| Traceback (most recent call last):
| File "collect_trace_info.py", line 65, in CollectTraceInfo
| z.write(pathname)
| File "C:\Python23\lib\zipfile.py", line 416, in write
| self.fp.write(zinfo.FileHeader())
| File "C:\Python23\lib\zipfile.py", line 170, in FileHeader
| return header + self.filename + self.extra
| UnicodeDecodeError: ''ascii'' codec can''t decode byte 0x88 in position
| 12:
| ordinal not in range(128)
|
| After you have regained your composure, you find the reason: "header"
| is a struct.pack() generated byte string. self.filename is however a
| unicode string because it is returned by os.listdir as unicode. If
| "header" generates anything above 0x7F - which can but need not
| happen, depending on the type of file you have an exception waiting
| for yourself - sometimes. Great. (The same will probably occur if
| filename contains chars > 0x7F). The problem does not occur if you
| have "str" type filenames, because then no backandforth conversion is
| being made.
|
| There is a simple fix, before calling z.write() byte-encode it. Here
| is a sample code:
|
| import os, zipfile, win32api
|
| def test(directory):
| z =
|
zipfile.ZipFile(os.path.join(directory,"temp.zip") ,"w",zipfile.ZIP_DEFLATED)
| for filename in os.listdir(directory):
| z.write(os.path.join(directory, filename))
| z.close()
|
| if __name__ == "__main__":
| test(unicode(win32api.GetSystemDirectory()))
|
| Note: It might work on your system, depending on the types of files.
| To fix it, use
|
| z.write(os.path.join(directory, filename).encode("latin-1"))
|
| But to my thinking, this is a bug in zipfile.py, really.
|
| Now, could anybody please just write a
| "i-don''t-care-if-my-app-can-display-klingon-characters" raw byte
| encoding which doesn''t throw any assertions and doesn''t care whether
| or not the characters are in the 0x7F range? Its ok if I cannot port
| my batchscripts to swaheli, really.
|
|


" vincent wehren" < 6 ***** @ visualtrans.de> schrieb im Newsbeitrag

新闻:bo ********** @ news4.tilbu1.nb.home.nl ...

|

| Gerson Kurz < GE ********* @ t-online.de> schrieb im Newsbeitrag

|新闻:3f ************* @ news.t-online.de ...

| | AAAAAAAARG我讨厌python处理unicode的方式。这是一个很好的

| |你们都喜欢的问题:说你有一个变量就是unicode

| |

| | directory = u" c:\ temp"

| |

| |它的unicode不是因为你想要它,而是因为它的例如

| |从_winreg读取,返回unicode。

| |

| |你做一个os.listdir(目录)。请注意,返回的所有文件名都是

| |现在unicode。 (改变介绍我相信2.3)。

|

|错了。

|

|只有当type(目录)给你< type''unicode''>

|时才会这样。如果在执行os.listdir(目录)之前调用str(目录)

|你(在大多数情况下)甚至想要注意并且可以继续做你想要的事情




当我说在大多数情况下时,我的意思是所有那些目录

的字符都没有映射到ASCII

范围之外的单字节值的字符。在其他情况下你会去:


目录=

directory.encode(your_favorite_and_hoepfully_the_r ight_single_byte_legacy_en

coding_here)

问候,


Vincent

|做

|很好 - 加上,这是好的部分 - 你可以忘记

|你以后建议的那些黑客,有些人会考虑*邪恶*。

|它也可以节省一些时间。

|

|嘿,离开我的斯瓦希里语朋友吧! ;)

|

| HTH,

| Vincent Wehren

|

|

|

| |

| |您将文件名添加到zipfile.ZipFile对象。有时,你会

| |得到这个例外:

| |

| | Traceback(最近一次电话会议):

| |文件collect_trace_info.py,第65行,在CollectTraceInfo中

| | z.write(路径名)

| |文件C:\Python23 \lib \ zipfile.py,第416行,写作

| | self.fp.write(zinfo.FileHeader())

| |文件C:\Python23 \lib \ zipfile.py,第170行,在FileHeader中

| | return header + self.filename + self.extra

| | UnicodeDecodeError:''ascii''编解码器无法解码位于字节0x88的位置

| | 12:

| |序数不在范围内(128)

| |

| |重新获得镇定后,您会找到原因:标题

| |是struct.pack()生成的字节字符串。然而,self.filename是

| | unicode字符串,因为它由os.listdir作为unicode返回。如果

| | "报头QUOT;生成0x7F以上的任何东西 - 可以但不一定需要

| |发生,取决于您等待异常的文件类型

| |对你自己 - 有时候。大。 (如果

| | filename包含字符> 0x7F),可能会发生同样的情况。如果你这个问题没有发生

| |有str输入文件名,因为没有后退转换是

| |正在制作。

| |

| |在调用z.write()字节编码之前,有一个简单的修复。这里

| |是一个示例代码:

| |

| | import os,zipfile,win32api

| |

| | def测试(目录):

| | z =

| |

|

zipfile.ZipFile(os.path.join(目录," temp.zip")," w",zipfile.ZIP_DEFLATED)

| |对于os.listdir(目录)中的文件名:

| | z.write(os.path.join(目录,文件名))

| | z.close()

| |

| | if __name__ ==" __ main __":

| | test(unicode(win32api.GetSystemDirectory()))

| |

| |注意:它可能适用于您的系统,具体取决于文件类型。

| |要修复它,请使用

| |

| | z.write(os.path.join(目录,文件名).encode(" latin-1"))

| |

| |但据我的想法,这是zipfile.py中的一个错误,真的。

| |

| |现在,任何人都可以写一个

| | " I-申明,不要护理-IF-MY-APP-可以显示-克林贡字符"原始字节

| |编码不会抛出任何断言,也不在乎

| |或者字符是否在0x7F范围内?如果我不能移动它就可以了。

| |我的批量文件是swaheli,真的。

| |

| |

|

|
"vincent wehren" <vi*****@visualtrans.de> schrieb im Newsbeitrag
news:bo**********@news4.tilbu1.nb.home.nl...
|
| "Gerson Kurz" <ge*********@t-online.de> schrieb im Newsbeitrag
| news:3f*************@news.t-online.de...
| | AAAAAAAARG I hate the way python handles unicode. Here is a nice
| | problem for y''all to enjoy: say you have a variable thats unicode
| |
| | directory = u"c:\temp"
| |
| | Its unicode not because you want it to, but because its for example
| | read from _winreg which returns unicode.
| |
| | You do an os.listdir(directory). Note that all filenames returned are
| | now unicode. (Change introduced I believe in 2.3).
|
| Wrong.
|
| That''s only true if type(directory) gives you <type ''unicode''>
| If you call str(directory) before doing os.listdir(directory)
| you (in most cases) want even notice and can continue doing what you want
to

And when I say "in most cases", I mean all those cases where "directory"
doesn''t have characters that map to a single-byte value outside of the ASCII
range. In other cases you''ll just go :

directory =
directory.encode(your_favorite_and_hoepfully_the_r ight_single_byte_legacy_en
coding_here)

before calling os.listdir(directory)
Regards,

Vincent
| do
| just fine - plus, and that''s the good part - you can forget about
| those hacks you suggest later and which some would consider *evil*.
| It''ll save yourself some time too.
|
| Hey, and leave my Swahili friends alone will ya! ;)
|
| HTH,
| Vincent Wehren
|
|
|
| |
| | You add the filenames to a zipfile.ZipFile object. Sometimes, you will
| | get this exception:
| |
| | Traceback (most recent call last):
| | File "collect_trace_info.py", line 65, in CollectTraceInfo
| | z.write(pathname)
| | File "C:\Python23\lib\zipfile.py", line 416, in write
| | self.fp.write(zinfo.FileHeader())
| | File "C:\Python23\lib\zipfile.py", line 170, in FileHeader
| | return header + self.filename + self.extra
| | UnicodeDecodeError: ''ascii'' codec can''t decode byte 0x88 in position
| | 12:
| | ordinal not in range(128)
| |
| | After you have regained your composure, you find the reason: "header"
| | is a struct.pack() generated byte string. self.filename is however a
| | unicode string because it is returned by os.listdir as unicode. If
| | "header" generates anything above 0x7F - which can but need not
| | happen, depending on the type of file you have an exception waiting
| | for yourself - sometimes. Great. (The same will probably occur if
| | filename contains chars > 0x7F). The problem does not occur if you
| | have "str" type filenames, because then no backandforth conversion is
| | being made.
| |
| | There is a simple fix, before calling z.write() byte-encode it. Here
| | is a sample code:
| |
| | import os, zipfile, win32api
| |
| | def test(directory):
| | z =
| |
|
zipfile.ZipFile(os.path.join(directory,"temp.zip") ,"w",zipfile.ZIP_DEFLATED)
| | for filename in os.listdir(directory):
| | z.write(os.path.join(directory, filename))
| | z.close()
| |
| | if __name__ == "__main__":
| | test(unicode(win32api.GetSystemDirectory()))
| |
| | Note: It might work on your system, depending on the types of files.
| | To fix it, use
| |
| | z.write(os.path.join(directory, filename).encode("latin-1"))
| |
| | But to my thinking, this is a bug in zipfile.py, really.
| |
| | Now, could anybody please just write a
| | "i-don''t-care-if-my-app-can-display-klingon-characters" raw byte
| | encoding which doesn''t throw any assertions and doesn''t care whether
| | or not the characters are in the 0x7F range? Its ok if I cannot port
| | my batchscripts to swaheli, really.
| |
| |
|
|


这篇关于Unicode和Zipfile问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆