您如何以文本而非字节的形式读取zip文件中的文件? [英] How do you read a file inside a zip file as text, not bytes?
问题描述
在zip文件中读取CSV文件的简单程序在Python 2.7中有效,但在Python 3.2中不可用
A simple program for reading a CSV file inside a zip file works in Python 2.7, but not in Python 3.2
$ cat test_zip_file_py3k.py
import csv, sys, zipfile
zip_file = zipfile.ZipFile(sys.argv[1])
items_file = zip_file.open('items.csv', 'rU')
for row in csv.DictReader(items_file):
pass
$ python2.7 test_zip_file_py3k.py ~/data.zip
$ python3.2 test_zip_file_py3k.py ~/data.zip
Traceback (most recent call last):
File "test_zip_file_py3k.py", line 8, in <module>
for row in csv.DictReader(items_file):
File "/home/msabramo/run/lib/python3.2/csv.py", line 109, in __next__
self.fieldnames
File "/home/msabramo/run/lib/python3.2/csv.py", line 96, in fieldnames
self._fieldnames = next(self.reader)
_csv.Error: iterator should return strings, not bytes (did you open the file
in text mode?)
因此,Python 3中的csv
模块希望查看文本文件,但是zipfile.ZipFile.open
返回的zipfile.ZipExtFile
始终被视为二进制数据.
So the csv
module in Python 3 wants to see a text file, but zipfile.ZipFile.open
returns a zipfile.ZipExtFile
that is always treated as binary data.
如何使它在Python 3中工作?
How does one make this work in Python 3?
推荐答案
我刚刚注意到 Python 3.2 一起使用.他们在Python 3.2中增强了 zipfile.ZipExtFile
请参见发行说明).这些更改似乎使zipfile.ZipExtFile
与 io.TextWrapper
.
I just noticed that Lennart's answer didn't work with Python 3.1, but it does work with Python 3.2. They've enhanced zipfile.ZipExtFile
in Python 3.2 (see release notes). These changes appear to make zipfile.ZipExtFile
work nicely with io.TextWrapper
.
顺便说一句,如果您取消注释下面的hacky行来进行猴子补丁zipfile.ZipExtFile
的注释,它可以在Python 3.1中运行,不是我会推荐这种黑客.我包括它只是为了说明在Python 3.2中所做的实质,以使事情顺利进行.
Incidentally, it works in Python 3.1, if you uncomment the hacky lines below to monkey-patch zipfile.ZipExtFile
, not that I would recommend this sort of hackery. I include it only to illustrate the essence of what was done in Python 3.2 to make things work nicely.
$ cat test_zip_file_py3k.py
import csv, io, sys, zipfile
zip_file = zipfile.ZipFile(sys.argv[1])
items_file = zip_file.open('items.csv', 'rU')
# items_file.readable = lambda: True
# items_file.writable = lambda: False
# items_file.seekable = lambda: False
# items_file.read1 = items_file.read
items_file = io.TextIOWrapper(items_file)
for idx, row in enumerate(csv.DictReader(items_file)):
print('Processing row {0} -- row = {1}'.format(idx, row))
If I had to support py3k < 3.2, then I would go with the solution in my other answer.
这篇关于您如何以文本而非字节的形式读取zip文件中的文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!