Python 中的 zipfile 生成不太正常的 ZIP 文件 [英] zipfile in Python produces not quite normal ZIP files

查看:52
本文介绍了Python 中的 zipfile 生成不太正常的 ZIP 文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的项目中,创建了一组文件并将其打包为 ZIP 存档,以便在 Android 手机上使用.Android 应用程序打开此类 ZIP 文件以读取初始数据,然后将其工作结果存储到相同的 ZIP 文件中.我无法访问提到的 Android 应用程序的源代码和之前生成 zip 文件的旧脚本(实际上,我不知道创建了多旧的 ZIP 文件).但是 ZIP 存档的结构是已知的,我编写了新的 python 脚本来制作相同的文件.

In my project set of files are created and packed to ZIP archive to be used at Android mobile phone. Android application is opening such ZIP files for reading initial data and then store results of its work to the same ZIPs. I have no access to source code of mentioned Android App and old script that generated zip files before (actually, I do not know how old ZIP files were created). But structure of ZIP archive is known and I have written new python script to make the same files.

我遇到了以下问题:Android App 无法打开由我的脚本生成的 ZIP 文件(有关错误文件结构欠款的错误消息),但是如果我解压所有内容并将其打包回新的 ZIP 文件WinZIP7-Zip发送到 -> 压缩(zipped)文件夹"(在 Windows 7 中)文件的相同名称通常会被处理在电话上(这让我得出结论,问题不在 Android 应用程序中).

I was faced with the following problem: ZIP files produced by my script cannot be opened by Android App (error message about incorrect file structure arrears), but if I unpack all the contents and pack it back to new ZIP file with the same name by WinZIP, 7-Zip or "Send to -> Compressed (zipped) folder" (in Windows 7) file is normally processed on the phone (this leads me to the conclusion that the problem is not in the Android Application).

将文件夹打包成ZIP的代码片段如下

The code snippet for packing folder in ZIP was as follows

# make zip
try:
    with zipfile.ZipFile(prefix + '.zip', 'w') as zipf:
        for root, dirs, files in os.walk(prefix):
            for file in files:
                zipf.write(os.path.join(root, file))
    # remove dir, that was packed
    shutil.rmtree(prefix)
    # Report about resulting
    print('File ' + prefix + '.zip was created')
except:
    print('Unexpected error occurred while creating file ' + prefix + '.zip')

在我注意到文件没有被压缩后,我添加了压缩选项:

After I noticed that files are not compressed I added compression option:

 zipfile.ZipFile(prefix + '.zip', 'w', zipfile.ZIP_DEFLATED) 

但这并没有解决我的问题,并且为 allowZip64 设置 True 值也没有改变这种情况.

but this didn’t solve my problem and setting True value for allowZip64 also didn’t change the situation.

顺便说一下,使用 zipfile.ZIP_DEFLATED 生成的 ZIP 文件比 Windows 生成的 ZIP 文件小约 5 KB,比相同存档内容的 7-Zip 生成的结果小约 14 KB.同时,我可以通过 7-Zip 和 Windows 资源管理器打开所有这些 ZIP 文件进行视觉比较.

By the way a ZIP file produced with zipfile.ZIP_DEFLATED is about 5 kilobytes smaller than ZIP file produced by Windows and about 14 kilobytes smaller than 7-Zip’s result for the same archive content. At the same time all these ZIP files I can open for visual comparison by both 7-Zip and Windows Explorer.

所以我有三个相关的问题:

So I have three related questions:

1) 什么可能导致我的 zipfile 脚本出现如此奇怪的行为?

1) What may cause such strange behavior of my script with zipfile?

2) 我还能对 zipfile 产生什么影响?

2) How else can I influence on zipfile?

3) 如何检查使用 zipfile 创建的 ZIP 文件以发现可能的结构问题或确保没有问题?

3) How to check ZIP file created with zipfile to find possible structure problems or make sure there are no problems?

当然,如果我不得不放弃使用 zipfile,我可以使用外部存档器(例如 7-zip)进行文件打包,但如果存在,我想找到一个优雅的解决方案.

Of course, if I have to give up using zipfile I can use external archiver (e.g. 7-zip) for files packing, but I would like to find an elegant solution if it exists.

更新:

为了检查使用 zipfile 创建的 ZIP 文件的内容,我做了以下内容

In order to check content of ZIP file created with zipfile I made the following

# make zip
flist = []
try:
    with zipfile.ZipFile(prefix + '.zip', 'w', zipfile.ZIP_DEFLATED) as zipf:
        for root, dirs, files in os.walk(prefix):
            for file in files:
                zipf.write(os.path.join(root, file))
                # Store item in the list
                flist.append(os.path.join(root, file).replace("\\","/"))
    # remove dir, that was packed
    shutil.rmtree(prefix)
    # Report about resulting
    print('File ' + prefix + '.zip was created')
except:
    print('Unexpected error occurred while creating file ' + prefix + '.zip')
# Check of zip
with closing(zipfile.ZipFile(prefix + '.zip')) as zfile:
    for info in zfile.infolist():
        print(info.filename + \
              '  (extra = ' + str(info.extra) + \
              '; compress_type = ' + ('ZIP_DEFLATED' if info.compress_type == zipfile.ZIP_DEFLATED else 'NOT ZIP_DEFLATED')  + \
              ')')
        # remove item from list
        if info.filename in flist:
            flist.remove(info.filename)
        else:
            print(info.filename + ' is unexpected item')
print('Number of items that were missed:')
print(len(flist))

并在输出中看到以下结果:

And see the following results in the output:

File en_US_00001.zip was created
en_US_00001/en_US_00001_0001/en_US_00001_0001_big.png  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0001/en_US_00001_0001_info.xml  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0001/en_US_00001_0001_small.png  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0001/en_US_00001_0001_source.pkl  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0001/en_US_00001_0001_source.tex  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0001/en_US_00001_0001_user.png  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0002/en_US_00001_0002_big.png  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0002/en_US_00001_0002_info.xml  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0002/en_US_00001_0002_small.png  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0002/en_US_00001_0002_source.pkl  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0002/en_US_00001_0002_source.tex  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0002/en_US_00001_0002_user.png  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0003/en_US_00001_0003_big.png  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0003/en_US_00001_0003_info.xml  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0003/en_US_00001_0003_small.png  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0003/en_US_00001_0003_source.pkl  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0003/en_US_00001_0003_source.tex  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0003/en_US_00001_0003_user.png  (extra = b''; compress_type = ZIP_DEFLATED)
Number of items that were missed:
0

因此,所有内容都已写入,然后已读取,但问题仍然存在 - 是否已写入所有必要的内容?例如.在哈罗德关于相对路径的评论中......也许,这是答案的关键

Thus, all that was written, then was read, but the question remains - if all that is necessary has been written? E.g. in comments Harold said about relative paths... perhaps, it is the key to the answer

更新 2

当我使用外部 7-Zip 代码替换 zipfile

When I replaced zipfile by using external 7-Zip code

# make zip
subprocess.call(["7z.exe","a",prefix + ".zip", prefix])
shutil.rmtree(prefix)
# Check of zip
with closing(zipfile.ZipFile(prefix + '.zip')) as zfile:
    for info in zfile.infolist():
        print(info.filename)
        print('  (extra = ' + str(info.extra) + '; compress_type = ' + str(info.compress_type) + ')')
print('Values for compress_type:')
print(str(zipfile.ZIP_DEFLATED) + ' = ZIP_DEFLATED')
print(str(zipfile.ZIP_STORED) + ' = ZIP_STORED')

产生以下结果

Creating archive en_US_00001.zip

Compressing  en_US_00001\en_US_00001_0001\en_US_00001_0001_big.png
Compressing  en_US_00001\en_US_00001_0001\en_US_00001_0001_info.xml
Compressing  en_US_00001\en_US_00001_0001\en_US_00001_0001_small.png
Compressing  en_US_00001\en_US_00001_0001\en_US_00001_0001_source.pkl
Compressing  en_US_00001\en_US_00001_0001\en_US_00001_0001_source.tex
Compressing  en_US_00001\en_US_00001_0001\en_US_00001_0001_user.png
Compressing  en_US_00001\en_US_00001_0002\en_US_00001_0002_big.png
Compressing  en_US_00001\en_US_00001_0002\en_US_00001_0002_info.xml
Compressing  en_US_00001\en_US_00001_0002\en_US_00001_0002_small.png
Compressing  en_US_00001\en_US_00001_0002\en_US_00001_0002_source.pkl
Compressing  en_US_00001\en_US_00001_0002\en_US_00001_0002_source.tex
Compressing  en_US_00001\en_US_00001_0002\en_US_00001_0002_user.png
Compressing  en_US_00001\en_US_00001_0003\en_US_00001_0003_big.png
Compressing  en_US_00001\en_US_00001_0003\en_US_00001_0003_info.xml
Compressing  en_US_00001\en_US_00001_0003\en_US_00001_0003_small.png
Compressing  en_US_00001\en_US_00001_0003\en_US_00001_0003_source.pkl
Compressing  en_US_00001\en_US_00001_0003\en_US_00001_0003_source.tex
Compressing  en_US_00001\en_US_00001_0003\en_US_00001_0003_user.png

Everything is Ok

en_US_00001/
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00Faf\xd2Y\xf9\xd1\x01Faf\xd2Y\xf9\xd1\x01%\xc9c\xd2Y\xf9\xd1\x01'; compress_type = 0)
en_US_00001/en_US_00001_0001/
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\xbe(e\xd2Y\xf9\xd1\x01\xbe(e\xd2Y\xf9\xd1\x016\xf0c\xd2Y\xf9\xd1\x01'; compress_type = 0)
en_US_00001/en_US_00001_0001/en_US_00001_0001_big.png
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00G\x17d\xd2Y\xf9\xd1\x01G\x17d\xd2Y\xf9\xd1\x01G\x17d\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0001/en_US_00001_0001_info.xml
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00X>d\xd2Y\xf9\xd1\x01X>d\xd2Y\xf9\xd1\x01X>d\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0001/en_US_00001_0001_small.png
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00z\x8cd\xd2Y\xf9\xd1\x01ied\xd2Y\xf9\xd1\x01ied\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0001/en_US_00001_0001_source.pkl
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\x8b\xb3d\xd2Y\xf9\xd1\x01\x8b\xb3d\xd2Y\xf9\xd1\x01\x8b\xb3d\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0001/en_US_00001_0001_source.tex
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\xad\x01e\xd2Y\xf9\xd1\x01\xad\x01e\xd2Y\xf9\xd1\x01\xad\x01e\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0001/en_US_00001_0001_user.png
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\xbe(e\xd2Y\xf9\xd1\x01\xbe(e\xd2Y\xf9\xd1\x01\xbe(e\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0002/
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x005:f\xd2Y\xf9\xd1\x015:f\xd2Y\xf9\xd1\x01\xcfOe\xd2Y\xf9\xd1\x01'; compress_type = 0)
en_US_00001/en_US_00001_0002/en_US_00001_0002_big.png
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\xe0ve\xd2Y\xf9\xd1\x01\xcfOe\xd2Y\xf9\xd1\x01\xcfOe\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0002/en_US_00001_0002_info.xml
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\xf1\x9de\xd2Y\xf9\xd1\x01\xe0ve\xd2Y\xf9\xd1\x01\xe0ve\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0002/en_US_00001_0002_small.png
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\x02\xc5e\xd2Y\xf9\xd1\x01\x02\xc5e\xd2Y\xf9\xd1\x01\x02\xc5e\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0002/en_US_00001_0002_source.pkl
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\x13\xece\xd2Y\xf9\xd1\x01\x13\xece\xd2Y\xf9\xd1\x01\x13\xece\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0002/en_US_00001_0002_source.tex
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00$\x13f\xd2Y\xf9\xd1\x01$\x13f\xd2Y\xf9\xd1\x01$\x13f\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0002/en_US_00001_0002_user.png
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x005:f\xd2Y\xf9\xd1\x015:f\xd2Y\xf9\xd1\x015:f\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0003/
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\xdf\xc0g\xd2Y\xf9\xd1\x01\xdf\xc0g\xd2Y\xf9\xd1\x01Faf\xd2Y\xf9\xd1\x01'; compress_type = 0)
en_US_00001/en_US_00001_0003/en_US_00001_0003_big.png
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00W\x88f\xd2Y\xf9\xd1\x01W\x88f\xd2Y\xf9\xd1\x01W\x88f\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0003/en_US_00001_0003_info.xml
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00h\xaff\xd2Y\xf9\xd1\x01h\xaff\xd2Y\xf9\xd1\x01h\xaff\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0003/en_US_00001_0003_small.png
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\x9b$g\xd2Y\xf9\xd1\x01y\xd6f\xd2Y\xf9\xd1\x01y\xd6f\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0003/en_US_00001_0003_source.pkl
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\xacKg\xd2Y\xf9\xd1\x01\xacKg\xd2Y\xf9\xd1\x01\xacKg\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0003/en_US_00001_0003_source.tex
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\xce\x99g\xd2Y\xf9\xd1\x01\xce\x99g\xd2Y\xf9\xd1\x01\xce\x99g\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0003/en_US_00001_0003_user.png
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\xdf\xc0g\xd2Y\xf9\xd1\x01\xdf\xc0g\xd2Y\xf9\xd1\x01\xdf\xc0g\xd2Y\xf9\xd1\x01'; compress_type = 8)

Values for compress_type:
8 = ZIP_DEFLATED
0 = ZIP_STORED

据我所知,最重要的发现是:

As I understand the most important findings are:

  • 包含文件夹信息的项目(例如 en_US_00001/en_US_00001/en_US_00001_0001/),不在我使用 zipfile 生成的 ZIP 中代码>
  • 文件夹具有 compress_type == ZIP_STORED,而对于文件 compress_type == ZIP_DEFLATED
  • extras 有不同的值(生成了很长的字符串)
  • items with info for folders (e.g. en_US_00001/, en_US_00001/en_US_00001_0001/), that were not in the ZIP produced with my usage of zipfile
  • folders have compress_type == ZIP_STORED, while for files compress_type == ZIP_DEFLATED
  • extras have different values (quite long strings were generated)

推荐答案

基于问题的 UPDATE 2 中列出的差异和来自 关于 zipfile 的其他问题,我尝试了以下代码将目录添加到 ZIP文件并检查结果:

Based on the differences listed in UPDATE 2 of Question and examples from other question about zipfile, I have tried the following code to add directories to ZIP file and check the result:

# make zip
try:
    with zipfile.ZipFile(prefix + '.zip', 'w', zipfile.ZIP_DEFLATED) as zipf:
        info = zipfile.ZipInfo(prefix+'\\')
        zipf.writestr(info, '')
        for root, dirs, files in os.walk(prefix):
            for d in dirs:
                info = zipfile.ZipInfo(os.path.join(root, d)+'\\')
                zipf.writestr(info, '')
            for file in files:
                zipf.write(os.path.join(root, file))
    # remove dir, that was packed
    shutil.rmtree(prefix)
    # Report about resulting
    print('File ' + prefix + '.zip was created')
except:
    print('Unexpected error occurred while creating file ' + prefix + '.zip')
# Check zip content
with closing(zipfile.ZipFile(prefix + '.zip')) as zfile:
    for info in zfile.infolist():
        print(info.filename)
        print('  (extra = ' + str(info.extra) + '; compress_type = ' + str(info.compress_type) + ')')
print('Values for compress_type:')
print(str(zipfile.ZIP_DEFLATED) + ' = ZIP_DEFLATED')
print(str(zipfile.ZIP_STORED) + ' = ZIP_STORED')

输出是

File en_US_00001.zip was created
en_US_00001/
    (extra = b''; compress_type = 0)
en_US_00001/en_US_00001_0001/
    (extra = b''; compress_type = 0)
en_US_00001/en_US_00001_0002/
    (extra = b''; compress_type = 0)
en_US_00001/en_US_00001_0003/
    (extra = b''; compress_type = 0)
en_US_00001/en_US_00001_0001/en_US_00001_0001_big.png
    (extra = b''; compress_type = 8)
en_US_00001/en_US_00001_0001/en_US_00001_0001_info.xml
    (extra = b''; compress_type = 8)
en_US_00001/en_US_00001_0001/en_US_00001_0001_small.png
    (extra = b''; compress_type = 8)
en_US_00001/en_US_00001_0001/en_US_00001_0001_source.pkl
    (extra = b''; compress_type = 8)
en_US_00001/en_US_00001_0001/en_US_00001_0001_source.tex
    (extra = b''; compress_type = 8)
en_US_00001/en_US_00001_0001/en_US_00001_0001_user.png
    (extra = b''; compress_type = 8)
en_US_00001/en_US_00001_0002/en_US_00001_0002_big.png
    (extra = b''; compress_type = 8)
en_US_00001/en_US_00001_0002/en_US_00001_0002_info.xml
    (extra = b''; compress_type = 8)
en_US_00001/en_US_00001_0002/en_US_00001_0002_small.png
    (extra = b''; compress_type = 8)
en_US_00001/en_US_00001_0002/en_US_00001_0002_source.pkl
    (extra = b''; compress_type = 8)
en_US_00001/en_US_00001_0002/en_US_00001_0002_source.tex
    (extra = b''; compress_type = 8)
en_US_00001/en_US_00001_0002/en_US_00001_0002_user.png
    (extra = b''; compress_type = 8)
en_US_00001/en_US_00001_0003/en_US_00001_0003_big.png
    (extra = b''; compress_type = 8)
en_US_00001/en_US_00001_0003/en_US_00001_0003_info.xml
    (extra = b''; compress_type = 8)
en_US_00001/en_US_00001_0003/en_US_00001_0003_small.png
    (extra = b''; compress_type = 8)
en_US_00001/en_US_00001_0003/en_US_00001_0003_source.pkl
    (extra = b''; compress_type = 8)
en_US_00001/en_US_00001_0003/en_US_00001_0003_source.tex
    (extra = b''; compress_type = 8)
en_US_00001/en_US_00001_0003/en_US_00001_0003_user.png
    (extra = b''; compress_type = 8)
Values for compress_type:
8 = ZIP_DEFLATED
0 = ZIP_STORED

向目录名称(+'\\'+'/')添加斜杠似乎是强制性的.

Adding slash to directory names (+'\\' or +'/') appeared mandatory.

最重要的是 - 现在 ZIP 文件已被 Android 应用程序正确接受.

And the most important thing - now ZIP file is properly accepted by Android Application.

这篇关于Python 中的 zipfile 生成不太正常的 ZIP 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆