Unicode和Zipfile问题 [英] Unicode and Zipfile problems

查看：103 发布时间：2019/6/5 3:55:19 python

本文介绍了Unicode和Zipfile问题的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

AAAAAAAARG我讨厌python处理unicode的方式。这是一个很好的

问题，你们都可以享受：说你有一个变量就是unicode

directory = u" c：\ temp"

它的unicode不是因为你想要它，而是因为它的例如

从_winreg读取它返回unicode。

你做一个os.listdir（目录）。请注意，返回的所有文件名都是

现在是unicode。（更改介绍我相信2.3）。

您将文件名添加到zipfile.ZipFile对象。有时，你会得到这个例外：

回溯（最近一次调用最后一次）：

文件collect_trace_info.py ;，第65行，在CollectTraceInfo

z.write（路径名）

文件C：\Python23 \lib \ zipfile.py，第416行，写作

self.fp.write（zinfo.FileHeader（））

文件C：\Python23 \lib \ zipfile.py，第170行，在FileHeader中

返回标题+ self.filename + self.extra

UnicodeDecodeError：''ascii''编解码器无法解码字节0x88的位置

12：

序数不在范围内（128）

重新获得镇定后，您会找到原因：标题

是struct.pack（）生成的字节字符串。然而，self.filename是一个

unicode字符串，因为os.listdir将它作为unicode返回。如果

" header"生成高于0x7F的任何东西 - 可以但不需要

发生，具体取决于您等待异常的文件类型

自己 - 有时候。大。（如果

文件名包含字符> 0x7F，则可能会出现同样的情况）。如果你没有发生问题

有str输入文件名，因为那时没有后退转换

。

在调用z.write（）字节编码之前，有一个简单的修复方法。这里

是一个示例代码：

import os，zipfile，win32api

def test（目录）：

z =

zipfile.ZipFile（os.path.join（目录，" temp.zip"），" w"，zipfile.ZIP_DEFLATED）
os.listdir（目录）中的文件名
：

z.write（os.path.join（目录，文件名））

z.close（）

if __name__ ==" __ main __"：

test（unicode（win32api.GetSystemDirectory（）））

注意：它可能适用于您的系统，具体取决于文件类型。

要修复它，请使用

z.write（os.path .join（目录，文件名）.encode（" latin-1"））

现在，任何人都可以写一个

i-don''t-care-if-my-app-can-display-klingon-characters原始字节

编码，它不会抛出任何断言，也不在乎

或者字符是否在0x7F范围内？如果我不能将我的批量文件移植到swaheli，那就没问题了。

AAAAAAAARG I hate the way python handles unicode. Here is a nice
problem for y''all to enjoy: say you have a variable thats unicode

directory = u"c:\temp"

Its unicode not because you want it to, but because its for example
read from _winreg which returns unicode.

You do an os.listdir(directory). Note that all filenames returned are
now unicode. (Change introduced I believe in 2.3).

You add the filenames to a zipfile.ZipFile object. Sometimes, you will
get this exception:

Traceback (most recent call last):
File "collect_trace_info.py", line 65, in CollectTraceInfo
z.write(pathname)
File "C:\Python23\lib\zipfile.py", line 416, in write
self.fp.write(zinfo.FileHeader())
File "C:\Python23\lib\zipfile.py", line 170, in FileHeader
return header + self.filename + self.extra
UnicodeDecodeError: ''ascii'' codec can''t decode byte 0x88 in position
12:
ordinal not in range(128)

After you have regained your composure, you find the reason: "header"
is a struct.pack() generated byte string. self.filename is however a
unicode string because it is returned by os.listdir as unicode. If
"header" generates anything above 0x7F - which can but need not
happen, depending on the type of file you have an exception waiting
for yourself - sometimes. Great. (The same will probably occur if
filename contains chars > 0x7F). The problem does not occur if you
have "str" type filenames, because then no backandforth conversion is
being made.

There is a simple fix, before calling z.write() byte-encode it. Here
is a sample code:

import os, zipfile, win32api

def test(directory):
z =
zipfile.ZipFile(os.path.join(directory,"temp.zip") ,"w",zipfile.ZIP_DEFLATED)
for filename in os.listdir(directory):
z.write(os.path.join(directory, filename))
z.close()

if __name__ == "__main__":
test(unicode(win32api.GetSystemDirectory()))

Note: It might work on your system, depending on the types of files.
To fix it, use

z.write(os.path.join(directory, filename).encode("latin-1"))

But to my thinking, this is a bug in zipfile.py, really.

Now, could anybody please just write a
"i-don''t-care-if-my-app-can-display-klingon-characters" raw byte
encoding which doesn''t throw any assertions and doesn''t care whether
or not the characters are in the 0x7F range? Its ok if I cannot port
my batchscripts to swaheli, really.

推荐答案

这是我的dontcare - 编解码器，故意丑陋：

导入编解码器

def E（i，e =''''）：

l = lambda c：chr（min（ord（c），255））

r ="" .join（map（l，i））

return（r，len（r））

def D（i，e =''''）：

l = lambda c：unichr（ord（ c））

r = u"" .join（map（l，i））

return（r，len（r））

>
class c（codecs.Codec）：

def编码（self，i，e =''''）：

返回E（i，e ）

def解码（self，i，e =''''）：

返回D（i，e）

class w（c，codecs.StreamWriter）：传递

class r（c，codecs.StreamReader）：传递

getregentry = lambda：（E， D，r，w）

要安装，请另存为python23 \lib \ encodings \ left.datcare.py并启用它。

in site.py 。这是一个测试代码：

尝试：

打印unicode（chr（0xFF））

除了UnicodeDecodeError，e：

打印e

试试：

print unichr（12345）

除了UnicodeEncodeError，e：

打印e

Here is my "dontcare"-codec, ugly on purpose:

import codecs

def E(i,e=''''):
l=lambda c:chr(min(ord(c),255))
r="".join(map(l,i))
return (r,len(r))

def D(i,e=''''):
l=lambda c: unichr(ord(c))
r=u"".join(map(l,i))
return (r,len(r))

class c(codecs.Codec):
def encode(self, i,e=''''):
return E(i,e)
def decode(self, i,e=''''):
return D(i,e)

class w(c,codecs.StreamWriter): pass
class r(c,codecs.StreamReader): pass

getregentry = lambda: (E, D, r, w)

To install, save as python23\lib\encodings\dontcare.py and enable it
in site.py. Here is a testcode:

try:
print unicode(chr(0xFF))
except UnicodeDecodeError, e:
print e

try:
print unichr(12345)
except UnicodeEncodeError, e:
print e

" Gerson Kurz" < GE ********* @ t-online.de> schrieb im Newsbeitrag

新闻：3f ************* @ news.t-online.de ...

| AAAAAAAARG我讨厌python处理unicode的方式。这是一个很好的

|你们都喜欢的问题：说你有一个变量即unicode

|

| directory = u" c：\ temp"

|

|它的unicode不是因为你想要它，而是因为它的例如

|从_winreg读取，返回unicode。

|

|你做一个os.listdir（目录）。请注意，返回的所有文件名都是

|现在unicode。（改变介绍我相信2.3）。

错误。

只有当type（目录）给你<输入''unicode''>

如果在执行os.listdir（目录）之前调用str（目录）

你（在大多数情况下）你甚至想要通知并且可以继续做你想做的事情

做

就好了 - 加上，这是好事 - 你可以忘记

那些你以后建议的黑客，有些人会考虑*邪恶*。

这样可以节省一些时间。

嘿，离开我的斯瓦希里语朋友你会独自一人！ ;）

HTH，

Vincent Wehren

|

|您将文件名添加到zipfile.ZipFile对象。有时，你会

|得到这个例外：

|

| Traceback（最近一次电话会议）：

|文件collect_trace_info.py，第65行，在CollectTraceInfo中

| z.write（路径名）

|文件C：\Python23 \lib \ zipfile.py，第416行，写作

| self.fp.write（zinfo.FileHeader（））

|文件C：\Python23 \lib \ zipfile.py，第170行，在FileHeader中

| return header + self.filename + self.extra

| UnicodeDecodeError：''ascii''编解码器无法解码位于字节0x88的位置

| 12：

|序数不在范围内（128）

|

|重新获得镇定后，您会找到原因：标题

|是struct.pack（）生成的字节字符串。然而，self.filename是

| unicode字符串，因为它由os.listdir作为unicode返回。如果

| "报头QUOT;生成0x7F以上的任何东西 - 可以但不一定需要

|发生，取决于您等待异常的文件类型

|对你自己 - 有时候。大。（如果

|文件名包含字符> 0x7F），可能会发生同样的情况。如果你这个问题没有发生

|有str输入文件名，因为没有后退转换是

|正在制作。

|

|在调用z.write（）字节编码之前，有一个简单的修复。这里

|是一个示例代码：

|

| import os，zipfile，win32api

|

| def测试（目录）：

| z =

|

zipfile.ZipFile（os.path.join（目录，" temp.zip"），" w"，zipfile.ZIP_DEFLATED）

|对于os.listdir（目录）中的文件名：

| z.write（os.path.join（目录，文件名））

| z.close（）

|

| if __name__ ==" __ main __"：

| test（unicode（win32api.GetSystemDirectory（）））

|

|注意：它可能适用于您的系统，具体取决于文件类型。

|要解决它，请使用

|

| z.write（os.path.join（目录，文件名）.encode（" latin-1"））

|

|但据我的想法，这是zipfile.py中的一个错误，真的。

|

|现在，任何人都可以写一个

| " I-申明，不要护理-IF-MY-APP-可以显示-克林贡字符"原始字节

|编码不会抛出任何断言，也不在乎

|或者字符是否在0x7F范围内？如果我不能移动它就可以了。

|我的批次写到swaheli，真的。

|

|

"Gerson Kurz" <ge*********@t-online.de> schrieb im Newsbeitrag
news:3f*************@news.t-online.de...
| AAAAAAAARG I hate the way python handles unicode. Here is a nice
| problem for y''all to enjoy: say you have a variable thats unicode
|
| directory = u"c:\temp"
|
| Its unicode not because you want it to, but because its for example
| read from _winreg which returns unicode.
|
| You do an os.listdir(directory). Note that all filenames returned are
| now unicode. (Change introduced I believe in 2.3).

Wrong.

That''s only true if type(directory) gives you <type ''unicode''>
If you call str(directory) before doing os.listdir(directory)
you (in most cases) want even notice and can continue doing what you want to
do
just fine - plus, and that''s the good part - you can forget about
those hacks you suggest later and which some would consider *evil*.
It''ll save yourself some time too.

Hey, and leave my Swahili friends alone will ya! ;)

HTH,
Vincent Wehren

|
| You add the filenames to a zipfile.ZipFile object. Sometimes, you will
| get this exception:
|
| Traceback (most recent call last):
| File "collect_trace_info.py", line 65, in CollectTraceInfo
| z.write(pathname)
| File "C:\Python23\lib\zipfile.py", line 416, in write
| self.fp.write(zinfo.FileHeader())
| File "C:\Python23\lib\zipfile.py", line 170, in FileHeader
| return header + self.filename + self.extra
| UnicodeDecodeError: ''ascii'' codec can''t decode byte 0x88 in position
| 12:
| ordinal not in range(128)
|
| After you have regained your composure, you find the reason: "header"
| is a struct.pack() generated byte string. self.filename is however a
| unicode string because it is returned by os.listdir as unicode. If
| "header" generates anything above 0x7F - which can but need not
| happen, depending on the type of file you have an exception waiting
| for yourself - sometimes. Great. (The same will probably occur if
| filename contains chars > 0x7F). The problem does not occur if you
| have "str" type filenames, because then no backandforth conversion is
| being made.
|
| There is a simple fix, before calling z.write() byte-encode it. Here
| is a sample code:
|
| import os, zipfile, win32api
|
| def test(directory):
| z =
|
zipfile.ZipFile(os.path.join(directory,"temp.zip") ,"w",zipfile.ZIP_DEFLATED)
| for filename in os.listdir(directory):
| z.write(os.path.join(directory, filename))
| z.close()
|
| if __name__ == "__main__":
| test(unicode(win32api.GetSystemDirectory()))
|
| Note: It might work on your system, depending on the types of files.
| To fix it, use
|
| z.write(os.path.join(directory, filename).encode("latin-1"))
|
| But to my thinking, this is a bug in zipfile.py, really.
|
| Now, could anybody please just write a
| "i-don''t-care-if-my-app-can-display-klingon-characters" raw byte
| encoding which doesn''t throw any assertions and doesn''t care whether
| or not the characters are in the 0x7F range? Its ok if I cannot port
| my batchscripts to swaheli, really.
|
|

" vincent wehren" < 6 ***** @ visualtrans.de> schrieb im Newsbeitrag

新闻：bo ********** @ news4.tilbu1.nb.home.nl ...

|

| Gerson Kurz < GE ********* @ t-online.de> schrieb im Newsbeitrag

|新闻：3f ************* @ news.t-online.de ...

| | AAAAAAAARG我讨厌python处理unicode的方式。这是一个很好的

| |你们都喜欢的问题：说你有一个变量就是unicode

| |

| | directory = u" c：\ temp"

| |

| |它的unicode不是因为你想要它，而是因为它的例如

| |从_winreg读取，返回unicode。

| |

| |你做一个os.listdir（目录）。请注意，返回的所有文件名都是

| |现在unicode。（改变介绍我相信2.3）。

|

|错了。

|

|只有当type（目录）给你< type''unicode''>

|时才会这样。如果在执行os.listdir（目录）之前调用str（目录）

|你（在大多数情况下）甚至想要注意并且可以继续做你想要的事情

到

当我说在大多数情况下时，我的意思是所有那些目录

的字符都没有映射到ASCII

范围之外的单字节值的字符。在其他情况下你会去：

目录=

directory.encode（your_favorite_and_hoepfully_the_r ight_single_byte_legacy_en

coding_here）

问候，

Vincent

|做

|很好 - 加上，这是好的部分 - 你可以忘记

|你以后建议的那些黑客，有些人会考虑*邪恶*。

|它也可以节省一些时间。

|

|嘿，离开我的斯瓦希里语朋友吧！ ;）

|

| HTH，

| Vincent Wehren

|

|

|

| |

| |您将文件名添加到zipfile.ZipFile对象。有时，你会

| |得到这个例外：

| |

| | Traceback（最近一次电话会议）：

| |文件collect_trace_info.py，第65行，在CollectTraceInfo中

| | z.write（路径名）

| |文件C：\Python23 \lib \ zipfile.py，第416行，写作

| | self.fp.write（zinfo.FileHeader（））

| |文件C：\Python23 \lib \ zipfile.py，第170行，在FileHeader中

| | return header + self.filename + self.extra

| | UnicodeDecodeError：''ascii''编解码器无法解码位于字节0x88的位置

| | 12：

| |序数不在范围内（128）

| |

| |重新获得镇定后，您会找到原因：标题

| |是struct.pack（）生成的字节字符串。然而，self.filename是

| | unicode字符串，因为它由os.listdir作为unicode返回。如果

| | "报头QUOT;生成0x7F以上的任何东西 - 可以但不一定需要

| |发生，取决于您等待异常的文件类型

| |对你自己 - 有时候。大。（如果

| | filename包含字符> 0x7F），可能会发生同样的情况。如果你这个问题没有发生

| |有str输入文件名，因为没有后退转换是

| |正在制作。

| |

| |在调用z.write（）字节编码之前，有一个简单的修复。这里

| |是一个示例代码：

| |

| | import os，zipfile，win32api

| |

| | def测试（目录）：

| | z =

| |

|

zipfile.ZipFile（os.path.join（目录，" temp.zip"），" w"，zipfile.ZIP_DEFLATED）

| |对于os.listdir（目录）中的文件名：

| | z.write（os.path.join（目录，文件名））

| | z.close（）

| |

| | if __name__ ==" __ main __"：

| | test（unicode（win32api.GetSystemDirectory（）））

| |

| |注意：它可能适用于您的系统，具体取决于文件类型。

| |要修复它，请使用

| |

| | z.write（os.path.join（目录，文件名）.encode（" latin-1"））

| |

| |但据我的想法，这是zipfile.py中的一个错误，真的。

| |

| |现在，任何人都可以写一个

| | " I-申明，不要护理-IF-MY-APP-可以显示-克林贡字符"原始字节

| |编码不会抛出任何断言，也不在乎

| |或者字符是否在0x7F范围内？如果我不能移动它就可以了。

| |我的批量文件是swaheli，真的。

| |

| |

|

|

"vincent wehren" <vi*****@visualtrans.de> schrieb im Newsbeitrag
news:bo**********@news4.tilbu1.nb.home.nl...
|
| "Gerson Kurz" <ge*********@t-online.de> schrieb im Newsbeitrag
| news:3f*************@news.t-online.de...
| | AAAAAAAARG I hate the way python handles unicode. Here is a nice
| | problem for y''all to enjoy: say you have a variable thats unicode
| |
| | directory = u"c:\temp"
| |
| | Its unicode not because you want it to, but because its for example
| | read from _winreg which returns unicode.
| |
| | You do an os.listdir(directory). Note that all filenames returned are
| | now unicode. (Change introduced I believe in 2.3).
|
| Wrong.
|
| That''s only true if type(directory) gives you <type ''unicode''>
| If you call str(directory) before doing os.listdir(directory)
| you (in most cases) want even notice and can continue doing what you want
to

And when I say "in most cases", I mean all those cases where "directory"
doesn''t have characters that map to a single-byte value outside of the ASCII
range. In other cases you''ll just go :

directory =
directory.encode(your_favorite_and_hoepfully_the_r ight_single_byte_legacy_en
coding_here)

before calling os.listdir(directory)
Regards,

Vincent
| do
| just fine - plus, and that''s the good part - you can forget about
| those hacks you suggest later and which some would consider *evil*.
| It''ll save yourself some time too.
|
| Hey, and leave my Swahili friends alone will ya! ;)
|
| HTH,
| Vincent Wehren
|
|
|
| |
| | You add the filenames to a zipfile.ZipFile object. Sometimes, you will
| | get this exception:
| |
| | Traceback (most recent call last):
| | File "collect_trace_info.py", line 65, in CollectTraceInfo
| | z.write(pathname)
| | File "C:\Python23\lib\zipfile.py", line 416, in write
| | self.fp.write(zinfo.FileHeader())
| | File "C:\Python23\lib\zipfile.py", line 170, in FileHeader
| | return header + self.filename + self.extra
| | UnicodeDecodeError: ''ascii'' codec can''t decode byte 0x88 in position
| | 12:
| | ordinal not in range(128)
| |
| | After you have regained your composure, you find the reason: "header"
| | is a struct.pack() generated byte string. self.filename is however a
| | unicode string because it is returned by os.listdir as unicode. If
| | "header" generates anything above 0x7F - which can but need not
| | happen, depending on the type of file you have an exception waiting
| | for yourself - sometimes. Great. (The same will probably occur if
| | filename contains chars > 0x7F). The problem does not occur if you
| | have "str" type filenames, because then no backandforth conversion is
| | being made.
| |
| | There is a simple fix, before calling z.write() byte-encode it. Here
| | is a sample code:
| |
| | import os, zipfile, win32api
| |
| | def test(directory):
| | z =
| |
|
zipfile.ZipFile(os.path.join(directory,"temp.zip") ,"w",zipfile.ZIP_DEFLATED)
| | for filename in os.listdir(directory):
| | z.write(os.path.join(directory, filename))
| | z.close()
| |
| | if __name__ == "__main__":
| | test(unicode(win32api.GetSystemDirectory()))
| |
| | Note: It might work on your system, depending on the types of files.
| | To fix it, use
| |
| | z.write(os.path.join(directory, filename).encode("latin-1"))
| |
| | But to my thinking, this is a bug in zipfile.py, really.
| |
| | Now, could anybody please just write a
| | "i-don''t-care-if-my-app-can-display-klingon-characters" raw byte
| | encoding which doesn''t throw any assertions and doesn''t care whether
| | or not the characters are in the 0x7F range? Its ok if I cannot port
| | my batchscripts to swaheli, really.
| |
| |
|
|

这篇关于Unicode和Zipfile问题的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Unicode和Zipfile问题 [英] Unicode and Zipfile problems

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Unicode和Zipfile问题 [英] Unicode and Zipfile problems

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭