如何验证两个Docker映像的内容是否完全相同? [英] How to verify if the content of two Docker images is exactly the same?

查看:147
本文介绍了如何验证两个Docker映像的内容是否完全相同?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们如何确定两个Docker映像具有完全相同的文件系统结构,并且无论文件时间戳如何,相应文件的内容都相同?

How can we determine that two Docker images have exactly the same file system structure, and that the content of corresponding files is the same, irrespective of file timestamps?

我尝试了映像ID,但是在使用相同的Dockerfile和干净的本地存储库进行构建时,它们会有所不同。我通过构建一个图像,清理本地存储库,然后触摸其中一个文件以更改其修改日期,然后构建第二个图像以及它们的图像ID不匹配来进行此测试。我使用的是Docker 17.06(我相信是最新版本)。

I tried the image IDs but they differ when building from the same Dockerfile and a clean local repository. I did this test by building one image, cleaning the local repository, then touching one of the files to change its modification date, then building the second image, and their image IDs do not match. I used Docker 17.06 (the latest version I believe).

推荐答案

经过一番研究,我想出了一个快速且快速的解决方案。

After some research I came up with a solution which is fast and clean per my tests.

整体解决方案是这样的:

The overall solution is this:


  1. 创建一个通过 docker create ...

  2. 通过将其整个文件系统导出到tar存档的图像容器docker export ...

  3. 将存档目录名称,符号链接名称,符号链接内容,文件名称和文件内容放入哈希函数(例如,MD5) )

  4. 比较不同图像的哈希值,以验证其内容是否相等

  1. Create a container for your image via docker create ...
  2. Export its entire file system to a tar archive via docker export ...
  3. Pipe the archive directory names, symlink names, symlink contents, file names, and file contents, to an hash function (e.g., MD5)
  4. Compare the hashes of different images to verify if their contents are equal or not

就是这样。

从技术上讲,这可以按照以下步骤进行:

Technically, this can be done as follows:

1)创建文件 md5docker ,并赋予其执行权,例如, chmod + x md5docker

1) Create file md5docker, and give it execution rights, e.g., chmod +x md5docker:

#!/bin/sh
dir=$(dirname "$0")
docker create $1 | { read cid; docker export $cid | $dir/tarcat | md5; docker rm $cid > /dev/null; }

2)创建文件 tarcat ,然后赋予它执行权,例如 chmod + x tarcat

2) Create file tarcat, and give it execution rights, e.g., chmod +x tarcat:

#!/usr/bin/env python3
# coding=utf-8

if __name__ == '__main__':
    import sys
    import tarfile
    with tarfile.open(fileobj=sys.stdin.buffer, mode="r|*") as tar:
        for tarinfo in tar:
            if tarinfo.isfile():
                print(tarinfo.name, flush=True)
                with tar.extractfile(tarinfo) as file:
                    sys.stdout.buffer.write(file.read())
            elif tarinfo.isdir():
                print(tarinfo.name, flush=True)
            elif tarinfo.issym() or tarinfo.islnk():
                print(tarinfo.name, flush=True)
                print(tarinfo.linkname, flush=True)
            else:
                print("\33[0;31mIGNORING:\33[0m ", tarinfo.name, file=sys.stderr)

3)现在调用 ./ md5docker< image> ,其中< image> 是您的图像名称或ID,用于计算图像整个文件系统的MD5哈希值。

3) Now invoke ./md5docker <image>, where <image> is your image name or id, to compute an MD5 hash of the entire file system of your image.

验证两个图像是否具有相同的内容

To verify if two images have the same contents just check that their hashes are equal as computed in step 3).

请注意,此解决方案仅将其视为内容目录结构,常规文件内容和符号链接(软链接和硬链接) 。如果您需要更多,只需通过添加更多 elif 子句测试要包含的内容来更改 tarcat 脚本(请参见 Python的tarfile ,然后查找方法 TarInfo.isXXX( )对应于所需的内容。)

Note that this solution only considers as content directory structure, regular file contents, and symlinks (soft and hard). If you need more just change the tarcat script by adding more elif clauses testing for the content you wish to include (see Python's tarfile, and look for methods TarInfo.isXXX() corresponding to the needed content).

我在此解决方案中看到的唯一限制是它对Python的依赖(我使用的是Python3,但是适应Python2应该非常容易)。一个没有任何依赖性的更好的解决方案,并且可能更快(嘿,这已经非常快了),是用一种支持静态链接的语言编写 tarcat 脚本,以便一个独立的可执行文件文件就足够了(即,不需要任何外部依赖关系,而是唯一的OS)。我选择将其保留在C,Rust,OCaml,Haskell中作为以后的练习,您可以选择。

The only limitation I see in this solution is its dependency on Python (I am using Python3, but it should be very easy to adapt to Python2). A better solution without any dependency, and probably faster (hey, this is already very fast), is to write the tarcat script in a language supporting static linking so that a standalone executable file was enough (i.e., one not requiring any external dependencies, but the sole OS). I leave this as a future exercise in C, Rust, OCaml, Haskell, you choose.

请注意,如果MD5不适合您的需求,只需替换 md5 放在第一个脚本中,并带有您的哈希实用程序。

Note, if MD5 does not suit your needs, just replace md5 inside the first script with your hash utility.

希望这可以帮助任何人阅读。

Hope this helps anyone reading.

这篇关于如何验证两个Docker映像的内容是否完全相同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆