Dockerfile中的多运行与单链运行,哪个更好? [英] Multiple RUN vs. single chained RUN in Dockerfile, which is better?

查看:54
本文介绍了Dockerfile中的多运行与单链运行,哪个更好?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Dockerfile.1 执行多个 RUN :

FROM busybox
RUN echo This is the A > a
RUN echo This is the B > b
RUN echo This is the C > c

Dockerfile.2 加入了他们:

FROM busybox
RUN echo This is the A > a &&\
    echo This is the B > b &&\
    echo This is the C > c

每个 RUN 都创建一个层,因此我一直认为层越少越好,因此 Dockerfile.2 越好.

Each RUN creates a layer, so I always assumed that fewer layers is better and thus Dockerfile.2 is better.

RUN 删除以前的 RUN 添加的内容(即 yum install nano& yum clean all )时,这显然是正确的.),但在每个 RUN 都添加一些内容的情况下,我们需要考虑以下几点:

This is obviously true when a RUN removes something added by a previous RUN (i.e. yum install nano && yum clean all), but in cases where every RUN adds something, there are a few points we need to consider:

  1. 层应该只是在前一层之上添加差异,因此,如果后一层没有删除在前一层中添加的内容,则这两种方法之间在节省磁盘空间方面不会有太多优势.

  1. Layers are supposed to just add a diff above the previous one, so if the later layer does not remove something added in a previous one, there should not be much disk space saving advantage between both methods.

层是从Docker Hub并行提取的,因此 Dockerfile.1 尽管可能稍大,但理论上可以更快地下载.

Layers are pulled in parallel from Docker Hub, so Dockerfile.1, although probably slightly bigger, would theoretically get downloaded faster.

如果添加第4句(即 echo这是D> d )并进行本地重建,则由于缓存, Dockerfile.1 的构建速度会更快,但是 Dockerfile.2 必须再次运行所有4条命令.

If adding a 4th sentence (i.e. echo This is the D > d) and locally rebuilding, Dockerfile.1 would build faster thanks to cache, but Dockerfile.2 would have to run all 4 commands again.

因此,问题是:哪种方法可以更好地制作Dockerfile?

推荐答案

在可能的情况下,我总是将创建文件的命令与将相同文件删除的命令合并到一条 RUN 行中.这是因为每条 RUN 行都在图像上添加了一层,其输出实际上是文件系统更改,您可以使用 docker diff 在其创建的临时容器上查看该文件系统更改.如果删除在不同层中创建的文件,则联合文件系统所做的全部工作就是将文件系统更改注册到新层中,该文件仍存在于上一层中,并通过网络运送并存储在磁盘上.因此,如果您下载源代码,将其解压缩,将其编译为二进制文件,然后最后删除tgz和源文件,则您确实希望所有这些操作都在一个单独的层中完成以减小图像大小.

When possible, I always merge together commands that create files with commands that delete those same files into a single RUN line. This is because each RUN line adds a layer to the image, the output is quite literally the filesystem changes that you could view with docker diff on the temporary container it creates. If you delete a file that was created in a different layer, all the union filesystem does is register the filesystem change in a new layer, the file still exists in the previous layer and is shipped over the networked and stored on disk. So if you download source code, extract it, compile it into a binary, and then delete the tgz and source files at the end, you really want this all done in a single layer to reduce image size.

接下来,我根据图层在其他图像中的重用潜力和预期的缓存使用情况来亲自划分图层.如果我有4个映像,并且所有映像都具有相同的基本映像(例如debian),则可以将一组通用实用程序拖到第一个运行命令中,以将大多数这些映像拖入第一个运行命令中,以便其他映像可以从缓存中受益.

Next, I personally split up layers based on their potential for reuse in other images and expected caching usage. If I have 4 images, all with the same base image (e.g. debian), I may pull a collection of common utilities to most of those images into the first run command so the other images benefit from caching.

查看图像缓存重用时,Dockerfile中的顺序很重要.我查看的是很少更新的任何组件,可能只有在基本映像更新并将其放在Dockerfile中时才会更新.在Dockerfile的末尾,我包括了将快速运行并且可能会经常更改的任何命令,例如添加具有主机特定UID的用户或创建文件夹并更改权限.如果该容器包含正在积极开发的解释代码(例如JavaScript),则会尽快添加该代码,以便重建仅运行该单个更改.

Order in the Dockerfile is important when looking at image cache reuse. I look at any components that will update very rarely, possibly only when the base image updates and put those high up in the Dockerfile. Towards the end of the Dockerfile, I include any commands that will run quick and may change frequently, e.g. adding a user with a host specific UID or creating folders and changing permissions. If the container includes interpreted code (e.g. JavaScript) that is being actively developed, that gets added as late as possible so that a rebuild only runs that single change.

在每组更改中,我都尽我所能地合并以最大程度地减少层次.因此,如果有4个不同的源代码文件夹,则将它们放置在一个文件夹中,以便可以使用单个命令将其添加.尽可能将所有从apt-get之类的软件包安装到单个RUN中,以最大程度地减少软件包管理器的开销(更新和清理).

In each of these groups of changes, I consolidate as best I can to minimize layers. So if there are 4 different source code folders, those get placed inside a single folder so it can be added with a single command. Any package installs from something like apt-get are merged into a single RUN when possible to minimize the amount of package manager overhead (updating and cleaning up).

针对多阶段构建的更新:

对于在多阶段构建的非最终阶段减小图像尺寸的担心,我要少得多.如果没有标记这些阶段并将其发送到其他节点,则可以通过将每个命令拆分到单独的 RUN 行中来最大程度地提高缓存重用的可能性.

I worry much less about reducing image size in the non-final stages of a multi-stage build. When these stages aren't tagged and shipped to other nodes, you can maximize the likelihood of a cache reuse by splitting each command to a separate RUN line.

但是,这不是挤压层的完美解决方案,因为您在阶段之间复制的全部是文件,而不是其余的图像元数据,例如环境变量设置,入口点和命令.而且,当您在linux发行版中安装软件包时,库和其他依赖项可能分散在整个文件系统中,从而使所有依赖项的副本都变得很困难.

However, this isn't a perfect solution to squashing layers since all you copy between stages are the files, and not the rest of the image meta-data like environment variable settings, entrypoint, and command. And when you install packages in a linux distribution, the libraries and other dependencies may be scattered throughout the filesystem, making a copy of all the dependencies difficult.

因此,我使用多阶段构建代替在CI/CD服务器上构建二进制文件,因此我的CI/CD服务器仅需要具有运行 docker build 的工具.,并且没有安装jdk,nodejs,go和其他任何编译工具.

Because of this, I use multi-stage builds as a replacement for building binaries on a CI/CD server, so that my CI/CD server only needs to have the tooling to run docker build, and not have a jdk, nodejs, go, and any other compile tools installed.

这篇关于Dockerfile中的多运行与单链运行,哪个更好?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆