Dockerfile中VOLUME的实际用途是什么? [英] What is the practical purpose of VOLUME in Dockerfile?

查看:190
本文介绍了Dockerfile中VOLUME的实际用途是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先,我想澄清一下,我已经在研究此主题时进行了尽职调查。与之密切相关的是此SO问题,但并非如此确实解决了我的困惑。



我了解到,在Dockerfile中指定 VOLUME 时,这会指示Docker创建一个在映射到容器内指定目录的容器的持续时间内未命名的卷。例如:

 #Dockerfile 
VOLUME [ / foo]

这将创建一个卷,以包含存储在容器内 / foo 中的所有数据。该卷(通过 docker volume ls 查看时)将显示为随机数字。



每次您执行 docker run ,该卷将不会重复使用。这是造成混乱的关键。对我来说,卷的目标是包含在图像的所有实例(所有容器均从其启动)中持久存在的状态。所以基本上,如果我这样做了,没有显式的卷映射

 #!/ usr / bin / env bash 
#首次运行容器
docker run -t foo

#杀死容器并再次运行。请注意,先前的
#卷现在将包含数据,因为在`foo`
#中运行的服务会将数据写入该卷。
docker container stop foo
docker container rm foo

#第二次运行容器
docker run -t foo

我希望未命名的卷可以在两个 run 命令之间重用。然而,这种情况并非如此。因为我没有通过 -v 选项显式映射卷,所以为每个运行创建一个新卷。 / p>

重要部分2:由于我需要明确指定 -v 以便在<$之间共享持久状态c $ c> run 命令,为什么我曾经在Dockerfile中指定 VOLUME ?如果没有 VOLUME ,我可以这样做(使用前面的示例):

 #!/ usr / bin / env bash 
#创建用于状态持久化的卷
docker volume create foo_data

#首次运行容器
docker run -t -v foo_data:/ foo foo

#杀死容器并再次运行。请注意,先前的
#卷现在将包含数据,因为在`foo`
#中运行的服务会将数据写入该卷。
docker容器stop foo
docker容器rm foo

#再次运行容器
docker run -t -v foo_data:/ foo foo

现在,确实,第二个容器将数据装载到 / foo 从先前的实例那里开始。我可以在Dockerfile中没有 VOLUME 的情况下执行此操作。从命令行,我可以将容器内的任何目录转换为主机上绑定目录或Docker中卷的装载。



所以我的问题是:当您必须通过主机上的命令将命名卷显式映射到容器时, VOLUME 有什么意义?



请注意,我在这里的所有断言都是基于我对docker的行为以及行为的观察得出的。我是从文档中收集的。

解决方案

诸如 VOLUME EXPOSE 有点过时了。我们今天所知道的命名卷是在 Docker 1.9 ,大约是三年前。



在Docker 1.9之前,运行一个容器,该容器的映像具有一个或多个 VOLUME 指令(或使用-volume 选项)是创建用于数据共享或持久化的卷的唯一方法。实际上,过去的最佳实践是创建仅数据容器,其唯一目的是保存一个或多个卷,然后使用-volumes-from <与应用程序容器共享这些卷。 / code>选项。以下是一些描述这种过时模式的文章。





,请查看 moby / moby#17798(仅适用于Docker 1.9.0的仅数据容器?)讨论了从仅数据容器到命名卷的更改。



今天,我考虑了 VOLUME 作为高级工具的说明,仅在特殊情况下且经过仔细考虑后才可使用。例如,官方postgres图像声明了 / var / lib / postgresql / data 中的音量。通过将数据库数据保留在分层文件系统之外,这可以提高现成的postgres容器的性能。 Docker不必在容器映像的所有层中搜索 / var / lib / postgresql / data 中的文件请求。



但是, VOLUME 指令确实要付出一定的代价。




  • 用户可能不知道正在创建的未命名卷,并且在删除容器后继续占用其Docker主机上的存储空间。

  • 无法删除声明的卷在Dockerfile中。下游图像无法将数据添加到存在卷的路径中。



后一个问题会导致此类问题。





对于GitLab问题,有人想使用用于测试目的的预配置数据扩展GitLab映像,但是由于在父映像中/ var / opt / gitlab 处的音量。



tl; dr: VOLUME 是为Docker 1之前的世界设计的。 9。最好把它留在外面。


First of all, I want to make it clear I've done due diligence in researching this topic. Very closely related is this SO question, which doesn't really address my confusion.

I understand that when VOLUME is specified in a Dockerfile, this instructs Docker to create an unnamed volume for the duration of the container which is mapped to the specified directory inside of it. For example:

# Dockerfile
VOLUME ["/foo"]

This would create a volume to contain any data stored in /foo inside the container. The volume (when viewed via docker volume ls) would show up as a random jumble of numbers.

Each time you do docker run, this volume is not reused. This is the key point causing confusion here. To me, the goal of a volume is to contain state persistent across all instances of an image (all containers started from it). So basically if I do this, without explicit volume mappings:

#!/usr/bin/env bash
# Run container for the first time
docker run -t foo

# Kill the container and re-run it again. Note that the previous 
# volume would now contain data because services running in `foo`
# would have written data to that volume.
docker container stop foo
docker container rm foo

# Run container a second time
docker run -t foo

I expect the unnamed volume to be reused between the 2 run commands. However, this is not the case. Because I did not explicitly map a volume via the -v option, a new volume is created for each run.

Here's important part number 2: Since I'm required to explicitly specify -v to share persistent state between run commands, why would I ever specify VOLUME in my Dockerfile? Without VOLUME, I can do this (using the previous example):

#!/usr/bin/env bash
# Create a volume for state persistence
docker volume create foo_data

# Run container for the first time
docker run -t -v foo_data:/foo foo

# Kill the container and re-run it again. Note that the previous 
# volume would now contain data because services running in `foo`
# would have written data to that volume.
docker container stop foo
docker container rm foo

# Run container a second time
docker run -t -v foo_data:/foo foo

Now, truly, the second container will have data mounted to /foo that was there from the previous instance. I can do this without VOLUME in my Dockerfile. From the command line, I can turn any directory inside the container into a mount to either a bound directory on the host or a volume in Docker.

So my question is: What is the point of VOLUME when you have to explicitly map named volumes to containers via commands on the host anyway? Either I'm missing something or this is just confusing and obfuscated.

Note that all of my assertions here are based on my observations of how docker behaves, as well as what I've gathered from the documentation.

解决方案

Instructions like VOLUME and EXPOSE are a bit anachronistic. Named volumes as we know them today were introduced in Docker 1.9, almost three years ago.

Before Docker 1.9, running a container whose image had one or more VOLUME instructions (or using the --volume option) was the only way to create volumes for data sharing or persistence. In fact, it used to be a best practice to create data-only containers whose sole purpose was to hold one or more volumes, and then share those volumes with your application containers using the --volumes-from option. Here's some articles that describe this outdated pattern.

Also, check out moby/moby#17798 (Data-only containers obsolete with docker 1.9.0?) where the change from data-only containers to named volumes was discussed.

Today, I consider the VOLUME instruction as an advanced tool that should only be used for specialized cases, and after careful thought. For example, the official postgres image declares a VOLUME at /var/lib/postgresql/data. This can improve the performance of postgres containers out of the box by keeping the database data out of the layered filesystem. Docker doesn't have to search through all the layers of the container image for file requests at /var/lib/postgresql/data.

However, the VOLUME instruction does come at a cost.

  • Users might not be aware of the unnamed volumes being created, and continuing to take up storage space on their Docker host after containers are removed.
  • There is no way to remove a volume declared in a Dockerfile. Downstream images cannot add data to paths where volumes exist.

The latter issue results in problems like these.

For the GitLab question, someone wants to extend the GitLab image with pre-configured data for testing purposes, but it's impossible to commit that data in a downstream image because of the VOLUME at /var/opt/gitlab in the parent image.

tl;dr: VOLUME was designed for a world before Docker 1.9. Best to just leave it out.

这篇关于Dockerfile中VOLUME的实际用途是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆