我们可以在 docker 中运行多进程程序吗? [英] Can we run multi-process program in docker?

查看:21
本文介绍了我们可以在 docker 中运行多进程程序吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些像这样使用多进程的代码:

I have some code using multi-process like this:

import multiprocessing
from multiprocessing import Pool

pool = Pool(processes=100)
result = []

for job in job_list:        
    result.append( 
        pool.apply_async(
            handle_job, (job)
            )
        )
pool.close()
pool.join()

这个程序正在对非常大的数据集进行大量计算.所以我们需要多进程同时处理工作以提高性能.

This program is doing heavy calculation on very big data set. So we need multi-process to handle the job concurrently to improve performance.

有人告诉我,对于托管系统,一个 docker 容器只是一个进程.所以我想知道我的多进程将如何在 Docker 中处理?

I have been told that to the hosting system, one docker container is just one process. So I am wondering how my multi-process will be handled in Docker?

以下是我的担忧:

  1. 既然容器只是一个进程,那我的多进程代码会不会变成进程中的多线程?

  1. Since the container is just one process, will my multi-process code become multi-threading in the process?

性能会下降吗?因为我使用多进程的原因是为了同时完成工作以获得更好的性能.

Will the performance become down? Because the reason I use multi-process is to get job done concurrently to get better performance.

推荐答案

我怀疑大部分困惑来自于将容器视为轻量级 VM.相反,可以将 Linux 容器视为一种通过命名空间和 cgroup 设置运行进程的方式.

I suspect much of the confusion comes from thinking of containers as a lightweight VM. Instead, think of Linux containers as a way to run a process with some settings for namespaces and cgroups.

其中一个命名空间是 pid 命名空间.配置它时,您会看到该命名空间中的第一个进程作为命名空间内的 pid 1.从另一个 pid 命名空间,您看不到那些其他命名空间或主机命名空间.在主机上,在任何命名空间之外,您将看到所有进程,包括在任何命名空间中的进程.

One of those namespaces is the pid namespace. When you configure it, you see the first process in that namespace as pid 1 from within the namespace. From another pid namespace, you cannot see those other namespaces, or the host namespace. And on the host, outside of any namespace, you will see all processes, including those in any namespace.

当你 fork 一个新进程时,你继承了相同的命名空间和 cgroup,因此你将在 pid 命名空间中获得一个新的 pid,允许你像任何其他 Linux 环境一样运行多个进程.在容器内部,您可以运行 ps 命令(假设它包含在您的映像中)并查看多个进程正在运行:

When you fork a new process, you inherit the same namespaces and cgroups, so you will get a new pid within the pid namespace, allowing you to run multiple processes just like any other Linux environment. Inside of the container, you can run a ps command (assuming it's included in your image) and see multiple processes running:

$ docker run -it --rm busybox /bin/sh
/ # sleep 30s &
/ # ps -ef
PID   USER     TIME  COMMAND
    1 root      0:00 /bin/sh
    7 root      0:00 sleep 30s
    8 root      0:00 ps -ef

仅运行单个进程的建议并非来自多线程应用程序,而是来自将容器视为轻量级 VM 的人们.它们将产生多个彼此没有硬依赖的应用程序,例如 Web 服务器、数据库和邮件服务器.完成后,有几个关键问题:

Where the advice comes to only run a single process is not from multi-threaded apps, but instead from people treating the container as a lightweight VM. They will spawn multiple applications that have no hard dependency on each other, like a web server and a database and a mail server. When this is done, there are a couple key issues:

  • 容器日志不可用.它们要么混杂着多个进程,所有进程都写入同一个 stdout/stderr.或者它们是空的,而将日志写入容器文件系统,而这些文件系统经常会丢失.
  • 错误处理存在问题.如果邮件服务器出现错误,是否应该关闭数据库并重新启动以尝试更正问题?如果不杀掉整个容器,怎么知道邮件服务器宕机了?

简而言之,管理容器的设计假设每个容器一个应用程序,如果你打破这个假设,当工具不支持你的用例时,你可以保留这两个部分.

In short, the design of managing containers assumes one application per container, and if you break that assumption, you get to keep both pieces when the tooling doesn't support your use case.

注意几句:

  • 一旦 pid 1 退出,您的容器就会结束,无论您的分叉进程是否仍在运行.这意味着所有进程都被杀死并被回收.
  • 通常在 Linux 上,当父进程在没有等待其子 pid 的情况下死亡时,僵尸进程最终会被以 pid 1 运行的 init 进程收割.此收割进程不会越过 pid 命名空间边界,因此如果您 fork child进程,确保容器内的 pid 1 正在等待这些子进程清理它们.此任务的常见 pid 1 进程是 tini (init 向后拼写).甚至还有一个标志可以让 docker 为你运行它(--init).
  • Once pid 1 exits, your container ends, regardless of whether your forked processes are still running or not. This means all processes are killed and reaped.
  • Typically on Linux, when the parent processes die without waiting on their child pids, a zombie process is eventually reaped by the init process running as pid 1. This reaping process does not pass the pid namespace boundary, so if you fork child processes, make sure pid 1 inside the container is waiting on those child processes to clean them up. A common pid 1 process for this task is tini (init spelled backwards). There's even a flag to have docker run this for you (--init).

这篇关于我们可以在 docker 中运行多进程程序吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆