Dockerfile 中的多 RUN 与单链 RUN,哪个更好? [英] Multiple RUN vs. single chained RUN in Dockerfile, which is better?

查看:57
本文介绍了Dockerfile 中的多 RUN 与单链 RUN,哪个更好?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Dockerfile.1 执行多个RUN:

FROM busybox
RUN echo This is the A > a
RUN echo This is the B > b
RUN echo This is the C > c

Dockerfile.2 加入他们:

FROM busybox
RUN echo This is the A > a &&
    echo This is the B > b &&
    echo This is the C > c

每个 RUN 都会创建一个层,所以我一直认为层数越少越好,因此 Dockerfile.2 更好.

Each RUN creates a layer, so I always assumed that fewer layers is better and thus Dockerfile.2 is better.

RUN 删除由先前 RUN 添加的内容时,这显然是正确的(即 yum install nano && yum clean all),但是在每个 RUN 都添加一些东西的情况下,我们需要考虑以下几点:

This is obviously true when a RUN removes something added by a previous RUN (i.e. yum install nano && yum clean all), but in cases where every RUN adds something, there are a few points we need to consider:

  1. 层应该只是在前一层之上添加一个差异,所以如果后面的层没有删除前一层中添加的东西,那么两种方法之间应该没有太多的磁盘空间节省优势.

  1. Layers are supposed to just add a diff above the previous one, so if the later layer does not remove something added in a previous one, there should not be much disk space saving advantage between both methods.

图层是从 Docker Hub 中并行提取的,因此 Dockerfile.1 虽然可能稍大一些,但理论上下载速度会更快.

Layers are pulled in parallel from Docker Hub, so Dockerfile.1, although probably slightly bigger, would theoretically get downloaded faster.

如果添加第 4 句(即 echo This is the D > d)并在本地重建,Dockerfile.1 将通过缓存更快地构建,但是 Dockerfile.2 必须再次运行所有 4 个命令.

If adding a 4th sentence (i.e. echo This is the D > d) and locally rebuilding, Dockerfile.1 would build faster thanks to cache, but Dockerfile.2 would have to run all 4 commands again.

那么,问题来了:哪种方法更好地创建 Dockerfile?

推荐答案

如果可能,我总是将创建文件的命令与删除相同文件的命令合并到一个 RUN 行中.这是因为每个 RUN 行都会向图像添加一个层,输出实际上是文件系统更改,您可以在它创建的临时容器上使用 docker diff 查看.如果您删除在不同层创建的文件,联合文件系统所做的只是在新层中注册文件系统更改,该文件仍然存在于前一层中并通过网络传输并存储在磁盘上.所以如果你下载源代码,解压,编译成二进制文件,最后删除 tgz 和源文件,你真的希望这一切都在一个层中完成以减小图像大小.

When possible, I always merge together commands that create files with commands that delete those same files into a single RUN line. This is because each RUN line adds a layer to the image, the output is quite literally the filesystem changes that you could view with docker diff on the temporary container it creates. If you delete a file that was created in a different layer, all the union filesystem does is register the filesystem change in a new layer, the file still exists in the previous layer and is shipped over the networked and stored on disk. So if you download source code, extract it, compile it into a binary, and then delete the tgz and source files at the end, you really want this all done in a single layer to reduce image size.

接下来,我个人根据它们在其他图像中重用的潜力和预期的缓存使用情况来拆分图层.如果我有 4 个图像,它们都具有相同的基本图像(例如 debian),我可以将大部分图像的常用实用程序集合拉到第一次运行命令中,以便其他图像受益于缓存.

Next, I personally split up layers based on their potential for reuse in other images and expected caching usage. If I have 4 images, all with the same base image (e.g. debian), I may pull a collection of common utilities to most of those images into the first run command so the other images benefit from caching.

查看图像缓存重用时,Dockerfile 中的顺序很重要.我会查看很少更新的任何组件,可能只有在基础映像更新并将这些组件放在 Dockerfile 中时.在 Dockerfile 的末尾,我包含了任何可以快速运行并且可能经常更改的命令,例如添加具有主机特定 UID 的用户或创建文件夹和更改权限.如果容器包含正在积极开发的解释代码(例如 JavaScript),则会尽可能晚地添加,以便重建只运行该单一更改.

Order in the Dockerfile is important when looking at image cache reuse. I look at any components that will update very rarely, possibly only when the base image updates and put those high up in the Dockerfile. Towards the end of the Dockerfile, I include any commands that will run quick and may change frequently, e.g. adding a user with a host specific UID or creating folders and changing permissions. If the container includes interpreted code (e.g. JavaScript) that is being actively developed, that gets added as late as possible so that a rebuild only runs that single change.

在每组更改中,我都尽可能地进行整合,以尽量减少层级.因此,如果有 4 个不同的源代码文件夹,它们将被放置在一个文件夹中,以便可以使用单个命令添加它.在可能的情况下,从 apt-get 之类的工具安装的任何软件包都会合并到一个 RUN 中,以最大限度地减少软件包管理器的开销(更新和清理).

In each of these groups of changes, I consolidate as best I can to minimize layers. So if there are 4 different source code folders, those get placed inside a single folder so it can be added with a single command. Any package installs from something like apt-get are merged into a single RUN when possible to minimize the amount of package manager overhead (updating and cleaning up).

多阶段构建更新:

我不太担心在多阶段构建的非最终阶段减小图像大小.当这些阶段没有被标记并传送到其他节点时,您可以通过将每个命令拆分到单独的 RUN 行来最大化缓存重用的可能性.

I worry much less about reducing image size in the non-final stages of a multi-stage build. When these stages aren't tagged and shipped to other nodes, you can maximize the likelihood of a cache reuse by splitting each command to a separate RUN line.

但是,这并不是压缩层的完美解决方案,因为您在各个阶段之间复制的只是文件,而不是其他图像元数据,例如环境变量设置、入口点和命令.而且,当您在 linux 发行版中安装软件包时,库和其他依赖项可能会分散在整个文件系统中,因此很难复制所有依赖项.

However, this isn't a perfect solution to squashing layers since all you copy between stages are the files, and not the rest of the image meta-data like environment variable settings, entrypoint, and command. And when you install packages in a linux distribution, the libraries and other dependencies may be scattered throughout the filesystem, making a copy of all the dependencies difficult.

因此,我使用多阶段构建作为在 CI/CD 服务器上构建二进制文件的替代品,因此我的 CI/CD 服务器只需要具有运行 docker build 的工具,并且没有安装jdk、nodejs、go等编译工具.

Because of this, I use multi-stage builds as a replacement for building binaries on a CI/CD server, so that my CI/CD server only needs to have the tooling to run docker build, and not have a jdk, nodejs, go, and any other compile tools installed.

这篇关于Dockerfile 中的多 RUN 与单链 RUN,哪个更好?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆