zipfile.BadZipFile:提取受密码保护的.zip&时错误的CRC-32. .zip提取时损坏 [英] zipfile.BadZipFile: Bad CRC-32 when extracting a password protected .zip & .zip goes corrupt on extract

查看:817
本文介绍了zipfile.BadZipFile:提取受密码保护的.zip&时错误的CRC-32. .zip提取时损坏的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试提取一个受密码保护的.zip文件,其中包含一个.txt文件(对于这种情况,请说Congrats.txt).现在Congrats.txt中包含文本,因此其大小不为0kb.将其放置在.zip中(出于线程的考虑,让其命名为.zip zipv1.zip),并为此密码设置密码dominique.该密码将存储在另一个.txt中的其他单词和名称中(出于这个问题,我们将其命名为file.txt).

I am trying to extract a password protected .zip which has a .txt document (Say Congrats.txt for this case). Now Congrats.txt has text in it thus its not 0kb in size. Its placed in a .zip (For the sake of the thread lets name this .zip zipv1.zip) with the password dominique for the sake of this thread. That password is stored among other words and names within another .txt (Which we'll name it as file.txt for the sake of this question).

现在,如果我通过执行python Program.py -z zipv1.zip -f file.txt(假设所有这些文件都与Program.py位于同一文件夹中)来运行下面的代码,则我的程序将dominique作为zipv1.zip的正确密码显示在其他词中,输入file.txt中的密码并提取zipv1.zip,但Congrats.txt为空且大小为0kb.

Now if I run the code below by doing python Program.py -z zipv1.zip -f file.txt (Assuming all these files are in the same folder as Program.py) my program displays dominique as the correct password for the zipv1.zip among the other words/passwords in file.txt and extracts the zipv1.zip but the Congrats.txt is empty and has the size of 0kb.

现在我的代码如下:

import argparse
import multiprocessing
import zipfile

parser = argparse.ArgumentParser(description="Unzips a password protected .zip", usage="Program.py -z zip.zip -f file.txt")
# Creates -z arg
parser.add_argument("-z", "--zip", metavar="", required=True, help="Location and the name of the .zip file.")
# Creates -f arg
parser.add_argument("-f", "--file", metavar="", required=True, help="Location and the name of file.txt.")
args = parser.parse_args()


def extract_zip(zip_filename, password):
    try:
        zip_file = zipfile.ZipFile(zip_filename)
        zip_file.extractall(pwd=password)
        print(f"[+] Password for the .zip: {password.decode('utf-8')} \n")
    except:
        # If a password fails, it moves to the next password without notifying the user. If all passwords fail, it will print nothing in the command prompt.
        pass


def main(zip, file):
    if (zip == None) | (file == None):
        # If the args are not used, it displays how to use them to the user.
        print(parser.usage)
        exit(0)
    # Opens the word list/password list/dictionary in "read binary" mode.
    txt_file = open(file, "rb")
    # Allows 8 instances of Python to be ran simultaneously.
    with multiprocessing.Pool(8) as pool:
        # "starmap" expands the tuples as 2 separate arguments to fit "extract_zip"
        pool.starmap(extract_zip, [(zip, line.strip()) for line in txt_file])


if __name__ == '__main__':
    main(args.zip, args.file)

但是,如果我使用与zipv1.zip相同的方法(只是区别为Congrats.txt)的另一个zip(zipv2.zip)位于文件夹中,并且该文件夹与Congrats.txt一起压缩,我的确得到与,但是这次Congrats.txt沿着它所在的文件夹提取,并且Congrats.txt完好无损;其中的文字及其大小是完整的.

However if I another zip (zipv2.zip) with the same method as zipv1.zip with only difference being Congrats.txt is in a folder which the folder is zipped alongside Congrats.txt I do get the same results as zipv1.zip but this time Congrats.txt extracted along the folder it was in, and Congrats.txt was intact; the text in it and the size of it was intact.

为解决此问题,我尝试阅读 zipfile的文档如果密码与.zip不匹配,则会抛出RuntimeError.因此,我确实将代码中的except:更改为except RuntimeError:,并在尝试解压缩zipv1.zip时遇到此错误:

So to solve this I tried reading zipfile's documentation where I found out that if a password doesn't match the .zip it throws a RuntimeError. So I did changed except: in the code to except RuntimeError: and got this error when trying to unzip zipv1.zip:

(venv) C:\Users\USER\Documents\Jetbrains\PyCharm\Program>Program.py -z zipv1.zip -f file.txt
[+] Password for the .zip: dominique

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 121, in worker
result = (True, func(*args, **kwds))
  File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 47, in starmapstar
return list(itertools.starmap(args[0], args[1]))
  File "C:\Users\USER\Documents\Jetbrains\PyCharm\Program\Program.py", line 16, in extract_zip
zip_file.extractall(pwd=password)
  File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\zipfile.py", line 1594, in extractall
self._extract_member(zipinfo, path, pwd)
  File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\zipfile.py", line 1649, in _extract_member
shutil.copyfileobj(source, target)
  File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\shutil.py", line 79, in copyfileobj
buf = fsrc.read(length)
  File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\zipfile.py", line 876, in read
data = self._read1(n)
  File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\zipfile.py", line 966, in _read1
self._update_crc(data)
  File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\zipfile.py", line 894, in _update_crc
raise BadZipFile("Bad CRC-32 for file %r" % self.name)
zipfile.BadZipFile: Bad CRC-32 for file 'Congrats.txt'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\USER\Documents\Jetbrains\PyCharm\Program\Program.py", line 38, in <module>
main(args.zip, args.file)
  File "C:\Users\USER\Documents\Jetbrains\PyCharm\Program\Program.py", line 33, in main
pool.starmap(extract_zip, [(zip, line.strip()) for line in txt_file])
  File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 276, in starmap
return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 657, in get
raise self._value
zipfile.BadZipFile: Bad CRC-32 for file 'Congrats.txt'

尽管发生了相同的结果;在file.txt中找到了密码,提取了zipv1.zip,但Congrats.txt为空且大小为0kb.所以我再次运行了程序,但是这次是zipv2.zip,结果是这样的:

The same results happpen though; password was found in file.txt, zipv1.zip was extracted but Congrats.txt was empty and 0kb in size. So I ran the program again, but for zipv2.zip this time and got this as a result:

(venv) C:\Users\USER\Documents\Jetbrains\PyCharm\Program>Program.py -z zipv2.zip -f file.txt
[+] Password for the .zip: dominique

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 121, in worker
result = (True, func(*args, **kwds))
  File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 47, in starmapstar
return list(itertools.starmap(args[0], args[1]))
  File "C:\Users\USER\Documents\Jetbrains\PyCharm\Program\Program.py", line 16, in extract_zip
zip_file.extractall(pwd=password)
  File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\zipfile.py", line 1594, in extractall
self._extract_member(zipinfo, path, pwd)
  File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\zipfile.py", line 1649, in _extract_member
shutil.copyfileobj(source, target)
  File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\shutil.py", line 79, in copyfileobj
buf = fsrc.read(length)
  File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\zipfile.py", line 876, in read
data = self._read1(n)
  File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\zipfile.py", line 966, in _read1
self._update_crc(data)
  File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\zipfile.py", line 894, in _update_crc
raise BadZipFile("Bad CRC-32 for file %r" % self.name)
zipfile.BadZipFile: Bad CRC-32 for file 'Congrats.txt'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\USER\Documents\Jetbrains\PyCharm\Program\Program.py", line 38, in <module>
main(args.zip, args.file)
  File "C:\Users\USER\Documents\Jetbrains\PyCharm\Program\Program.py", line 33, in main
pool.starmap(extract_zip, [(zip, line.strip()) for line in txt_file])
  File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 276, in starmap
return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 657, in get
raise self._value
zipfile.BadZipFile: Bad CRC-32 for file 'Congrats.txt'

同样,结果相同;成功提取文件夹的位置,并且还提取了Congrats.txt及其内部的文本,并且其大小保持不变.

Again, same results; where the folder was extracted successfully and Congrats.txt was also extracted with the text inside it and the size of it was intact.

我确实查看了类似线程,以及线程,但它们没有帮助.我还检查了 zipfile的文档,但对于该问题没有帮助.

I did take a look at this similar thread, as well as this thread but they were no help. I also checked zipfile's documentation but it wasn't helpful regarding the issue.

现在由于某些未知和奇怪的原因而实施了with zipfile.ZipFile(zip_filename, 'r') as zip_file:之后;该程序可以读取/处理较小的单词列表/密码列表/字典,但如果其较大(?)则无法读取/处理.

Now after implementing with zipfile.ZipFile(zip_filename, 'r') as zip_file: for some unknown and weird reason; the program can read/process a small word list/password list/dictionary but can't if its large(?).

我的意思是说zipv1.zip中存在一个.txt文档;名为Congrats.txt的文本为You have cracked the .zip!. zipv2.zip中也存在相同的.txt,但是这次将其放置在名为ZIP Contents的文件夹中,然后进行了压缩/密码保护.两个邮编的密码均为dominique.

What I mean by that is that say a .txt document is present in zipv1.zip; named Congrats.txt with the text You have cracked the .zip!. The same .txt is present in zipv2.zip aswell, but this time placed in a folder named ZIP Contents then zipped/password protected. The password is dominique for both of the zips.

请注意,每个.zip都是使用Deflate压缩方法和7zip中的ZipCrypto加密生成的.

Do note that each .zip was generated using Deflate compression method and ZipCrypto encryption in 7zip.

现在该密码在Line 35(35/52行)John The Ripper Jr.txt中,在Line 1968中对于John The Ripper.txt(1968/3106行).

Now that password is in Line 35 (35/52 lines)John The Ripper Jr.txt and in Line 1968 for John The Ripper.txt (1968/3106 lines).

现在,如果您在CMD(或您选择的IDE)中执行python Program.py -z zipv1 -f "John The Ripper Jr.txt";它将创建一个名为Extracted的文件夹,并将Congrats.txt放入我们先前设置的句子中. zipv2也一样,但是Congrats.txt将位于Extracted文件夹内的ZIP Contents文件夹中.在这种情况下,解压缩.zip文件没有问题.

Now if you do python Program.py -z zipv1 -f "John The Ripper Jr.txt" in your CMD (or IDE of your choice); it will create a folder named Extracted and place Congrats.txt with the sentence we previously set. Same goes for zipv2 but Congrats.txt will be in ZIP Contents folder which is inside the Extracted folder. No trouble extracting the .zips in this instance.

但是,如果您在CMD(或您选择的IDE)中使用John The Ripper.txt尝试相同的操作,即python Program.py -z zipv1 -f "John The Ripper.txt",它将创建两个压缩文件的Extracted文件夹;就像John The Ripper Jr.txt一样,但是这次由于某些未知原因,Congrats.txt都将为空.

But if you try the same thing with John The Ripper.txt i.e python Program.py -z zipv1 -f "John The Ripper.txt" in your CMD (or IDE of your choice) it will create the Extracted folder both of the zips; just like John The Ripper Jr.txt but this time Congrats.txt will be empty for both of them for some unknown reason.

我的代码和所有必要的文件如下:

My code and all necessary files are as follows:

import argparse
import multiprocessing
import zipfile

parser = argparse.ArgumentParser(description="Unzips a password protected .zip by performing a brute-force attack.", usage="Program.py -z zip.zip -f file.txt")
# Creates -z arg
parser.add_argument("-z", "--zip", metavar="", required=True, help="Location and the name of the .zip file.")
# Creates -f arg
parser.add_argument("-f", "--file", metavar="", required=True, help="Location and the name of the word list/password list/dictionary.")
args = parser.parse_args()


def extract_zip(zip_filename, password):
    try:
        with zipfile.ZipFile(zip_filename, 'r') as zip_file:
            zip_file.extractall('Extracted', pwd=password)
            print(f"[+] Password for the .zip: {password.decode('utf-8')} \n")
    except:
        # If a password fails, it moves to the next password without notifying the user. If all passwords fail, it will print nothing in the command prompt.
        pass


def main(zip, file):
    if (zip == None) | (file == None):
        # If the args are not used, it displays how to use them to the user.
        print(parser.usage)
        exit(0)
    # Opens the word list/password list/dictionary in "read binary" mode.
    txt_file = open(file, "rb")
    # Allows 8 instances of Python to be ran simultaneously.
    with multiprocessing.Pool(8) as pool:
        # "starmap" expands the tuples as 2 separate arguments to fit "extract_zip"
        pool.starmap(extract_zip, [(zip, line.strip()) for line in txt_file])


if __name__ == '__main__':
    # Program.py - z zipname.zip -f filename.txt
    main(args.zip, args.file)

Program.py

zipv1.zip

zipv2.zip

开膛手约翰Jr.txt

John The Ripper.txt

开膛手约翰v2.txt

我不确定为什么会发生这种情况,因此无法在任何地方找到该问题的答案.据我所知,它是完全未知的,我找不到调试或解决此问题的方法.

I am unsure why this is happening and cannot find an answer for this issue anywhere. Its totally unknown from what I can tell and I can't find a way to debug or solve this issue.

无论单词/密码列表不同,这种情况都会继续发生.尝试使用相同的Congrats.txt生成更多.zip,但使用来自不同单词列表/密码列表/词典的不同密码的.zip.相同的方法; .txt使用了较大和较小的版本,并获得了与上述相同的结果.

This continues to occur regardless of different word/password lists. Tried generating more .zips with the same Congrats.txt but with different passwords from different word lists/password lists/dictionaries. Same method; a larger and smaller version of the .txt was used and same results as above were achieved.

我确实发现,如果我在John The Ripper.txt中切出前2k个单词并创建一个新的.txt,说John The Ripper v2.txt; .zip文件被成功解压缩,出现Extracted文件夹,并且出现Congrats.txt并在其中包含文本.因此,我认为它与密码输入后的行有关.因此,在这种情况下,Line 1968;在Line 1968之后脚本不会停止的地方?我不知道为什么这行得通.我想这不是解决方案,而是迈向解决方案的一步...

BUT I did find out that if I cut out the first 2k words in John The Ripper.txt and make a new .txt; say John The Ripper v2.txt; the .zip is extracted successfully, Extracted folder appears and Congrats.txt is present with the text inside it. So I believe it has to do with the lines after the password is at. So in this case Line 1968; where the script doesn't stop after Line 1968? I am not sure why does this work though. It isn't a solution but a step towards the solution I guess...

所以我尝试使用池终止"代码:

So I tried using a "pool terminating" code:

import argparse
import multiprocessing
import zipfile

parser = argparse.ArgumentParser(description="Unzips a password protected .zip by performing a brute-force attack using", usage="Program.py -z zip.zip -f file.txt")
# Creates -z arg
parser.add_argument("-z", "--zip", metavar="", required=True, help="Location and the name of the .zip file.")
# Creates -f arg
parser.add_argument("-f", "--file", metavar="", required=True, help="Location and the name of the word list/password list/dictionary.")
args = parser.parse_args()


def extract_zip(zip_filename, password, queue):
    try:
        with zipfile.ZipFile(zip_filename, "r") as zip_file:
            zip_file.extractall('Extracted', pwd=password)
            print(f"[+] Password for the .zip: {password.decode('utf-8')} \n")
            queue.put("Done")  # Signal success
    except:
        # If a password fails, it moves to the next password without notifying the user. If all passwords fail, it will print nothing in the command prompt.
        pass


def main(zip, file):
    if (zip == None) | (file == None):
        print(parser.usage)  # If the args are not used, it displays how to use them to the user.
        exit(0)
    # Opens the word list/password list/dictionary in "read binary" mode.
    txt_file = open(file, "rb")

    # Create a Queue
    manager = multiprocessing.Manager()
    queue = manager.Queue()

    with multiprocessing.Pool(8) as pool:  # Allows 8 instances of Python to be ran simultaneously.
        pool.starmap_async(extract_zip, [(zip, line.strip(), queue) for line in txt_file])  # "starmap" expands the tuples as 2 separate arguments to fit "extract_zip"
        pool.close()
        queue.get(True)  # Wait for a process to signal success
        pool.terminate()  # Terminate the pool
        pool.join()


if __name__ == '__main__':
    main(args.zip, args.file)  # Program.py -z zip.zip -f file.txt.

现在,如果我使用此两个zip,则它们都可以成功提取,就像以前的实例一样. 这次是zipv1.zipCongrats.txt完好无损;里面有消息.但是对于zipv2.zip仍然是空的,不能说相同的话.

Now if I use this both zips are extracted successfully, just like the previous instances. BUT this time zipv1.zip's Congrats.txt is intact; has the message inside it. But the same thing cannot be said regarding zipv2.zip as its still empty.

推荐答案

抱歉,长时间的停顿……看来您已经有点咸了.

Sorry for the long pause ... It seems you've got yourself into a bit of a pickle.

回顾 :

  • 处理受密码保护的 .zip 文件
  • 使用文件中的密码尝试使用
  • 暴力( ciobaneste )
  • 正确的密码位于(上一步)文件中,尽管如此,某些文件仍未正确提取
  • Working on a password protected .zip file
  • Brute force (ciobaneste) is attempted, using passwords from a file
  • The correct password is in the (previous step) file, but in spite of that, some files aren't properly extracted

这种情况很复杂(我想说,与 M CVE 距离很远),这种行为可以归咎于很多事情.

The scenario is complex (quite far away from an MCVE, I'd say), there are many things that can be blamed for the behavior.

zipv1.zip / zipv2.zip 不匹配开始.仔细观察,似乎 zipv2 也被弄乱了.如果对于 zipv1 (唯一的文件是 Congrats.txt )来说很容易发现,对于 zipv2 " ZIP Contents/Black-Large.png" 的大小为 0 ..
该文件可复制任何文件,甚至更多:它适用于 zf.namelist 返回的1 st 条目(不是目录).

Starting with the zipv1.zip / zipv2.zip mismatch. On a closer look, it appears that, zipv2 is messed up as well. If things are easy to spot for zipv1 (Congrats.txt being the only file), for zipv2, "ZIP Contents/Black-Large.png" is being 0 sized.
It is reproducible with any file, and more: it applies to 1st entry (which is not a dir) returned by zf.namelist.

因此,事情开始变得更加清晰:

So, things start to get a little bit clearer:

  • 由于 dominique 存在于密码文件中,因此文件内容正在解压缩(不知道到那时会发生什么情况)
  • 稍后, .zip 的1 st 条目被截断为 0 个字节
  • File contents is being unpacked, due to dominique being present in the password file (don't know what happens til that point)
  • At a later point, the .zip's 1st entry is truncated to 0 bytes

查看尝试使用错误密码提取文件时引发的异常,共有3种类型(其中最后2种可以组合在一起):

Looking at the exceptions thrown when attempting to extract files using a wrong password, there are 3 types (out of which the last 2 can be grouped together):

  1. RuntimeError:文件密码错误...
  2. 其他:
    • zlib.error:解压缩数据时出现错误-3 ...
    • zipfile.BadZipFile:文件CRC-32错误...
  1. RuntimeError: Bad password for file ...
  2. Others:
    • zlib.error: Error -3 while decompressing data ...
    • zipfile.BadZipFile: Bad CRC-32 for file ...

我创建了自己的存档文件.为了保持一致性,从现在开始我将使用它,但是所有内容也将适用于任何其他文件.

I created an archive file of my own. For consistency's sake, I'll be using it from now on, but everything would apply to any other file as well.

  • 内容:
    • DummyFile0.zip ( 10 个字节)-包含: 0123456789
    • DummyFile1.zip ( 10 个字节)-包含: 0000000000
    • DummyFile2.zip ( 10 个字节)-包含: AAAAAAAAAA
    • Content:
      • DummyFile0.zip (10 bytes) - containing: 0123456789
      • DummyFile1.zip (10 bytes) - containing: 0000000000
      • DummyFile2.zip (10 bytes) - containing: AAAAAAAAAA

      code.py :

      #!/usr/bin/env python3
      
      import sys
      import os
      import zipfile
      
      
      def main():
          arc_name = sys.argv[1] if len(sys.argv) > 1 else "./arc0.zip"
          pwds = [
              #b"dominique",
              #b"dickhead",
              b"coco",
          ]
          pwds = [item.strip() for item in open("orig/John The Ripper.txt.orig", "rb").readlines()]
          print("Unpacking (password protected: dominique) {:s},"
                " using a list of predefined passwords ...".format(arc_name))
          if not os.path.isfile(arc_name):
              raise SystemExit("Archive file must exist!\nExiting.")
          faulty_pwds = list()
          good_pwds = list()
          with zipfile.ZipFile(arc_name, "r") as zip_file:
              print("Zip names: {:}\n".format(zip_file.namelist()))
              for idx, pwd in enumerate(pwds):
                  try:
                      zip_file.extractall("Extracted", pwd=pwd)
                  except:
                      exc_cls, exc_inst, exc_tb = sys.exc_info()
                      if exc_cls != RuntimeError:
                          print("Exception caught when using password ({:d}): [{:}] ".format(idx, pwd))
                          print("    {:}: {:}".format(exc_cls, exc_inst))
                          faulty_pwds.append(pwd)
                  else:
                      print("Success using password ({:d}): [{:}] ".format(idx, pwd))
                      good_pwds.append(pwd)
          print("\nFaulty passwords: {:}\nGood passwords: {:}".format(faulty_pwds, good_pwds))
      
      
      if __name__ == "__main__":
          print("Python {:s} on {:s}\n".format(sys.version, sys.platform))
          main()
      

      输出:

      [cfati@CFATI-5510-0:e:\Work\Dev\StackOverflow\q054532010]> "e:\Work\Dev\VEnvs\py_064_03.06.08_test0\Scripts\python.exe" code.py arc0.zip
      Python 3.6.8 (tags/v3.6.8:3c6b436a57, Dec 24 2018, 00:16:47) [MSC v.1916 64 bit (AMD64)] on win32
      
      Unpacking (password protected: dominique) arc0.zip, using a list of predefined passwords ...
      Zip names: ['DummyFile0.txt', 'DummyFile1.txt', 'DummyFile2.txt']
      
      Exception caught when using password (1189): [b'mariah']
          <class 'zlib.error'>: Error -3 while decompressing data: invalid code lengths set
      Exception caught when using password (1446): [b'zebra']
          <class 'zlib.error'>: Error -3 while decompressing data: invalid block type
      Exception caught when using password (1477): [b'1977']
          <class 'zlib.error'>: Error -3 while decompressing data: invalid block type
      Success using password (1967): [b'dominique']
      Exception caught when using password (2122): [b'hank']
          <class 'zlib.error'>: Error -3 while decompressing data: invalid code lengths set
      Exception caught when using password (2694): [b'solomon']
          <class 'zlib.error'>: Error -3 while decompressing data: invalid distance code
      Exception caught when using password (2768): [b'target']
          <class 'zlib.error'>: Error -3 while decompressing data: invalid block type
      Exception caught when using password (2816): [b'trish']
          <class 'zlib.error'>: Error -3 while decompressing data: invalid code lengths set
      Exception caught when using password (2989): [b'coco']
          <class 'zlib.error'>: Error -3 while decompressing data: invalid stored block lengths
      
      Faulty passwords: [b'mariah', b'zebra', b'1977', b'hank', b'solomon', b'target', b'trish', b'coco']
      Good passwords: [b'dominique']
      

      查看 ZipFile.extractall 代码,它尝试提取所有成员. 1 st 引发了一个异常,因此开始更加清楚为什么它的行为方式相同.但是,当尝试使用2个错误的密码提取项目时,为什么会有行为上的差异?
      从对2种引发的异常类型的追溯中可以看出,答案就在 ZipFile.open 的末尾.

      Looking at ZipFile.extractall code, it tries to extract all the members. The 1st raises an exception, so it starts to be clearer why it behaves the way it does. But why the behavioral difference, when attempting to extract items using 2 wrong passwords?
      As seen in the tracebacks of the 2 different thrown exception types, the answer lies somewhere at the end of ZipFile.open.

      经过更多调查,结果是由于

      After more investigations, it turns out it's because of a

      根据 [ UT.CS]:dmitri-report-f15-16.pdf-ZIP文件中基于密码的加密((最后一个重点是我的):

      According to [UT.CS]: dmitri-report-f15-16.pdf - Password-based encryption in ZIP files ((last) emphasis is mine):

      3.1传统的PKWARE加密

      原始加密方案(通常称为PKZIP密码)是由以下人员设计的: 罗杰·谢菲(Roger Schaffely)[1].在[5]中,Biham和Kocher证明了该密码是弱的并得到证明 攻击需要13个字节的纯文本.已经开发了进一步的攻击,其中一些攻击 完全不需要用户提供纯文本[6]. PKZIP密码本质上是流密码,即通过生成伪密码对输入进行加密 随机密钥流,并将其与明文进行XOR运算.密码的内部状态包括 三个32位字中的一个: key0 key1 key2 .它们被初始化为 0x12345678 0x23456789 和 分别 0x34567890 .该算法的核心步骤涉及使用 输入单字节...

      The original encryption scheme, commonly referred to as the PKZIP cipher, was designed by Roger Schaffely [1]. In [5] Biham and Kocher showed that the cipher is weak and demonstrated an attack requiring 13 bytes of plaintext. Further attacks have been developed, some of which require no user provided plaintext at all [6]. The PKZIP cipher is essentially a stream cipher, i.e. input is encrypted by generating a pseudo- random key stream and XOR-ing it with the plaintext. The internal state of the cipher consists of three 32-bit words: key0, key1 and key2. These are initialized to 0x12345678, 0x23456789 and 0x34567890, respectively. A core step of the algorithm involves updating the three keys using a single byte of input...

      ...

      在对归档文件中的文件进行加密之前,首先会先随机压缩12个随机字节 内容,然后对所得的字节流进行加密.解密后,前12个字节 需要丢弃.根据规范,这样做是为了呈现纯文本 攻击对数据无效. 规范还指出,在12个前置字节中,实际上只有前11个字节 随机,最后一个字节等于未压缩的CRC-32的高位字节 文件内容.这样就可以快速验证给定的密码是否正确 通过将解密的12个字节的标头的最后一个字节与实际的高位字节进行比较 本地文件标题中包含的CRC-32值.这可以在解密 文件的其余部分.

      Before encrypting a file in the archive, 12 random bytes are first prepended to its compressed contents and the resulting bytestream is then encrypted. Upon decryption, the first 12 bytes need to be discarded. According to the specification, this is done in order to render a plaintext attack on the data ineffective. The specification also states that out of the 12 prepended bytes, only the first 11 are actually random, the last byte is equal to the high order byte of the CRC-32 of the uncompressed contents of the file. This gives the ability to quickly verify whether a given password is correct by comparing the last byte of the decrypted 12 byte header to the high order byte of the actual CRC-32 value that is included in the local file header. This can be done before decrypting the rest of the file.

      其他参考文献:

      算法弱点:由于仅在一个字节上进行了区分,而对于 256 不同(并经过精心选择),错误的事实密码,至少有一个密码会生成与正确密码相同的数字.

      The algorithm weakness: due to the fact that differentiation is done on one byte only, for 256 different (and carefully chosen) wrong passwords, there will be one (at least) that will generate the same number as the correct password.

      该算法会丢弃大多数错误的密码,但有些密码并没有.

      The algorithm discards most of the wrong passwords, but there are some that it doesn't.

      返回:尝试使用密码提取文件时:

      Going back: when a file is attempted to be extracted using a password:

      • 如果哈希"在文件密码的最后一个字节上计算出的值与文件 CRC 的高位字节不同,抛出了异常
      • 但是,如果它们相等:
        • 打开了一个新的文件流以进行写入(如果已经存在,则清空文件)
        • 尝试进行减压:
          • 对于错误的密码(已通过上述检查),解压缩将失败(但文件已被清空)
          • If the "hash" computed on the file cipher's last byte is different than file CRC's high order byte, an exception is thrown
          • But, if they are equal:
            • A new file stream is open for writing (emptying the file if already existing)
            • The decompression is attempted:
              • For wrong passwords (that have passed the above check), the decompression will fail (but the file is already emptied)

              从上面的输出中可以看到,对于我的( .zip )文件,有 8 个密码将其弄乱了.请注意:

              As seen from the output above, for my (.zip) file there are 8 passwords that mess it up. Note that:

              • 对于每个存档文件,结果都不同
              • 成员的文件名和内容是相关的(至少对于第一个 st 而言).更改其中任何一个将产生不同的结果(对于相同"存档文件)
              • For each archive file the result differs
              • The member file name and content are relevant (at least for the 1st one). Changing any of those will yield different results (for the "same" archive file)

              这是基于我的 .zip 文件中的数据的测试:

              Here's a test based on data from my .zip file:

              >>> import zipfile
              >>>
              >>> zd_coco = zipfile._ZipDecrypter(b"coco")
              >>> zd_dominique = zipfile._ZipDecrypter(b"dominique")
              >>> zd_other = zipfile._ZipDecrypter(b"other")
              >>> cipher = b'\xd1\x86y ^\xd77gRzZ\xee'  # Member (1st) file cipher: 12 bytes starting from archive offset 44
              >>>
              >>> crc = 2793719750  # Member (1st) file CRC - archive bytes: 14 - 17
              >>> hex(crc)
              '0xa684c7c6'
              >>> for zd in (zd_coco, zd_dominique, zd_other):
              ...     print(zd, [hex(zd(c)) for c in cipher])
              ...
              <zipfile._ZipDecrypter object at 0x0000021E8DA2E0F0> ['0x1f', '0x58', '0x89', '0x29', '0x89', '0xe', '0x32', '0xe7', '0x2', '0x31', '0x70', '0xa6']
              <zipfile._ZipDecrypter object at 0x0000021E8DA2E160> ['0xa8', '0x3f', '0xa2', '0x56', '0x4c', '0x37', '0xbb', '0x60', '0xd3', '0x5e', '0x84', '0xa6']
              <zipfile._ZipDecrypter object at 0x0000021E8DA2E128> ['0xeb', '0x64', '0x36', '0xa3', '0xca', '0x46', '0x17', '0x1a', '0xfb', '0x6d', '0x6c', '0x4e']
              >>>  # As seen, the last element of the first 2 arrays (coco and dominique) is 0xA6 (166), which is the same as the first byte of the CRC
              

              我使用其他拆包引擎(使用默认参数)进行了一些测试:

              I did some tests with other unpacking engines (with default arguments):

              1. WinRar :对于错误的密码,文件不会被修改,但是对于错误的密码,文件将被截断(与此处相同)
              2. 7-Zip :询问用户是否覆盖文件,并且不管压缩结果如何都对其进行覆盖
              3. Total Commander 的内部( zip )解包器:与 #2.
              4. 相同.
              1. WinRar: for a wrong password the file is untouched, but for a faulty one it is truncated (same as here)
              2. 7-Zip: It asks the user whether to overwrite the file, and it ovewrites it regardless of the decompression result
              3. Total Commander's internal (zip) unpacker: same as #2.

              3.结论

              • 我将其视为 zipfile 错误.指定这样的错误(错误)密码不应覆盖现有文件(如果有).或者至少,行为应该是一致的(对于所有错误的密码)
              • 快速浏览没有发现 Python
              • 上的任何错误
              • 我看不到简单的解决方法,例如:
                • zip 算法无法改进(以更好地检查密码是否正确)
                • 我想到了几个修复程序,但是它们可能会对性能产生负面影响,或者在某些(角落)情况下可能会导致性能下降
                • 3. Conclusion

                  • I see this as a zipfile bug. Specifying such a faulty (and wrong) password shouldn't overwrite the existing file (if any). Or at least, behavior should be consistent (for all wrong passwords)
                  • A quick browse didn't reveal any bug on Python
                  • I don't see an easy fix, as:
                    • The zip algorithm can't be improved (to better check whether a password is OK)
                    • I thought of a couple of fixes, but they will either negatively impact performance or could introduce regressions in some (corner) cases
                    • 我已提交 [GitHub]:python/cpython-[3.6] bpo -36247:zipfile-当提供错误密码(zip加密弱点) 时,提取截断(现有)文件 ,该分支为分支 3.6 (该分支已关闭)处于仅安全修复程序模式).不知道它的结果是什么(在其他分支机构),但是无论如何,它不会很快(在接下来的几个月里)可用.

                      I've submitted [GitHub]: python/cpython - [3.6] bpo-36247: zipfile - extract truncates (existing) file when bad password provided (zip encryption weakness) which was closed for branch 3.6 (which is in security fixes only mode). Not sure what its outcome it's going to be (in other branches), but anyway, it won't be available anytime soon (in the next months, let's say).

                      作为替代,您可以下载补丁,然后在本地应用更改.检查 [ [SO]:从鼠标右键单击PyCharm Community Edition中的上下文菜单运行/调试Django应用程序的UnitTests? (@CristiFati的答案)(修补 utrunner 部分),了解如何在 Win 上应用补丁(基本上,每行以一个"+" 符号进入,以一个-" 符号开头的每一行都熄灭).我正在使用 Cygwin btw .
                      您可以将 zipfile.py Python 的目录复制到您的项目(或某些个人")目录,并对该文件进行修补(如果您想保留 > Python 安装原始版本.

                      As an alternative, you could download the patch, and apply the changes locally. Check [SO]: Run/Debug a Django application's UnitTests from the mouse right click context menu in PyCharm Community Edition? (@CristiFati's answer) (Patching utrunner section) for how to apply patches on Win (basically, every line that starts with one "+" sign goes in, and every line that starts with one "-" sign goes out). I am using Cygwin, btw.
                      You could copy zipfile.py from Python's dir to your project (or some "personal") dir and patch that file, if you want to keep your Python installation pristine.

                      这篇关于zipfile.BadZipFile:提取受密码保护的.zip&amp;时错误的CRC-32. .zip提取时损坏的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆