如何从Python的zip文件中的zip文件读取? [英] How to read from a zip file within zip file in Python?
问题描述
我有一个要读取的文件,该文件本身已压缩在zip存档中.例如,parent.zip包含child.zip,而后者包含child.txt.我在阅读child.zip时遇到了麻烦.谁能纠正我的代码?
I have a file that I want to read that is itself zipped within a zip archive. For example, parent.zip contains child.zip, which contains child.txt. I am having trouble reading child.zip. Can anyone correct my code?
我假定我需要将child.zip创建为类似文件的对象,然后使用zipfile的第二个实例打开它,但是对于python来说,我的zipfile.ZipFile(zfile.open(name))是新手.它会产生一个zipfile.BadZipfile :(独立验证的)child.zip上的文件不是zip文件"
I assume that I need to create child.zip as a file-like object and then open it with a second instance of zipfile, but being new to python my zipfile.ZipFile(zfile.open(name)) is silly. It raises a zipfile.BadZipfile: "File is not a zip file" on (independently validated) child.zip
import zipfile
with zipfile.ZipFile("parent.zip", "r") as zfile:
for name in zfile.namelist():
if re.search(r'\.zip$', name) is not None:
# We have a zip within a zip
with **zipfile.ZipFile(zfile.open(name))** as zfile2:
for name2 in zfile2.namelist():
# Now we can extract
logging.info( "Found internal internal file: " + name2)
print "Processing code goes here"
推荐答案
在ZipFile
实例上使用.open()
调用时,您确实会得到一个打开的文件句柄.但是,要读取一个zip文件,ZipFile
类需要更多一点.它需要能够对该文件进行搜索,在您的情况下,.open()
返回的对象是不可搜索的.只有Python 3(3.2及更高版本)会生成支持搜索的ZipExFile
对象(只要外部zip文件的基础文件句柄是可搜索的,并且没有试图写入ZipFile
对象的东西).
When you use the .open()
call on a ZipFile
instance you indeed get an open file handle. However, to read a zip file, the ZipFile
class needs a little more. It needs to be able to seek on that file, and the object returned by .open()
is not seekable in your case. Only Python 3 (3.2 and up) produces a ZipExFile
object that supports seeking (provided the underlying file handle for the outer zip file is seekable, and nothing is trying to write to the ZipFile
object).
解决方法是使用.read()
将整个zip条目读取到内存中,将其存储在BytesIO
对象(可查找 的内存文件中)中,并将其提供给
The workaround is to read the whole zip entry into memory using .read()
, store it in a BytesIO
object (an in-memory file that is seekable) and feed that to ZipFile
:
from io import BytesIO
# ...
zfiledata = BytesIO(zfile.read(name))
with zipfile.ZipFile(zfiledata) as zfile2:
或者,在您的示例中:
import zipfile
from io import BytesIO
with zipfile.ZipFile("parent.zip", "r") as zfile:
for name in zfile.namelist():
if re.search(r'\.zip$', name) is not None:
# We have a zip within a zip
zfiledata = BytesIO(zfile.read(name))
with zipfile.ZipFile(zfiledata) as zfile2:
for name2 in zfile2.namelist():
# Now we can extract
logging.info( "Found internal internal file: " + name2)
print "Processing code goes here"
这篇关于如何从Python的zip文件中的zip文件读取?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!