打开不受支持的压缩类型的 zipfile 静默返回空文件流,而不是抛出异常 [英] Opening zipfile of unsupported compression-type silently returns empty filestream, instead of throwing exception

查看:62
本文介绍了打开不受支持的压缩类型的 zipfile 静默返回空文件流,而不是抛出异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

似乎让我摆脱了新手错误,而且我不是新手.我有一个 1.2G 已知良好的 zipfile 'train.zip' 包含一个 3.5G 文件 'train.csv'.我打开 zipfile 和文件本身,没有任何异常(没有 LargeZipFile),但生成的文件流似乎是空的.(UNIX 'unzip -c ...' 确认它是好的)Python ZipFile.open() 返回的文件对象不是可搜索或可分辨的,所以我无法检查.

Seem to be knocking my head off a newbie error and I am not a newbie. I have a 1.2G known-good zipfile 'train.zip' containing a 3.5G file 'train.csv'. I open the zipfile and file itself without any exceptions (no LargeZipFile), but the resulting filestream appears to be empty. (UNIX 'unzip -c ...' confirms it is good) The file objects returned by Python ZipFile.open() are not seek'able or tell'able, so I can't check that.

Python 发行版是 2.7.3 EPD-free 7.3-1(32 位);但对于大拉链应该没问题.操作系统为 MacOS 10.6.6

Python distribution is 2.7.3 EPD-free 7.3-1 (32-bit) ; but should be ok for large zips. OS is MacOS 10.6.6

import csv
import zipfile as zf

zip_pathname = os.path.join('/my/data/path/.../', 'train.zip')
#with zf.ZipFile(zip_pathname).open('train.csv') as z:
z = zf.ZipFile(zip_pathname, 'r', zf.ZIP_DEFLATED, allowZip64=True) # I tried all permutations
z.debug = 1
z.testzip() # zipfile integrity is ok

z1 = z.open('train.csv', 'r') # our file keeps coming up empty?

# Check the info to confirm z1 is indeed a valid 3.5Gb file...
z1i = z.getinfo(file_name)
for att in ('filename', 'file_size', 'compress_size', 'compress_type', 'date_time',  'CRC', 'comment'):
    print '%s:\t' % att, getattr(z1i,att)
# ... and it looks ok. compress_type = 9 ok?
#filename:  train.csv
#file_size: 3729150126
#compress_size: 1284613649
#compress_type: 9
#date_time: (2012, 8, 20, 15, 30, 4)
#CRC:   1679210291

# All attempts to read z1 come up empty?!
# z1.readline() gives ''
# z1.readlines() gives []
# z1.read() takes ~60sec but also returns '' ?

# code I would want to run is:
reader = csv.reader(z1)
header = reader.next()
return reader

推荐答案

原因在于:

  • 这个文件的压缩类型是类型 9:Deflate64/Enhanced Deflate(PKWare 的专有格式,而不是更常见的类型 8)
  • 和一个 zipfile 错误:它不会因不支持的压缩而引发异常-类型.它曾经只是默默地返回一个坏文件对象 [第4.4.5节压缩方法].啊.多么虚伪.更新:我提交了 bug 14313 并且它在 2012 年被修复,所以它现在在压缩类型时引发 NotImplementedError未知.
  • this file's compression type is type 9: Deflate64/Enhanced Deflate (PKWare's proprietary format, as opposed to the more common type 8)
  • and a zipfile bug: it will not throw an exception for unsupported compression-types. It used to just silently return a bad file object [Section 4.4.5 compression method]. Aargh. How bogus. UPDATE: I filed bug 14313 and it was fixed back in 2012 so it now raises NotImplementedError when the compression type is unknown.

命令行解决方法是解压缩,然后重新压缩,得到一个普通的类型 8:放气.

A command-line Workaround is to unzip, then rezip, to get a plain type 8: Deflated.

zipfile 将在 2.7、3.2+ 中抛出异常 我猜 zipfile 将永远无法实际处理类型 9,出于法律原因.Python 文档没有提到 zipfile 不能处理其他压缩类型 :(

zipfile will throw an exception in 2.7 , 3.2+ I guess zipfile will never be able to actually handle type 9, for legal reasons. The Python doc makes no mention whatsoever that zipfile cannot handle other compression types :(

这篇关于打开不受支持的压缩类型的 zipfile 静默返回空文件流,而不是抛出异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆