IPC 在单独的 Docker 容器中跨 Python 脚本共享内存 [英] IPC shared memory across Python scripts in separate Docker containers

查看：63 发布时间：2021/7/23 20:22:55 python docker ipc python-multiprocessing shared-memory

本文介绍了IPC 在单独的 Docker 容器中跨 Python 脚本共享内存的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我编写了一个神经网络分类器，它接收大量图像(每张约 1-3 GB)，将它们拼凑起来，然后将这些拼块单独通过网络.训练真的进行得很慢，所以我对它进行了基准测试，发现将一个图像中的补丁加载到内存中需要大约 50 秒(使用 Openslide 库)，并且只有 ~.5 秒才能将它们传递到模型中.

I have written a neural network classifier that takes in massive images (~1-3 GB apiece), patches them up, and passes the patches through the network individually. Training was going really slowly, so I benchmarked it and found that it was taking ~50s to load the patches from one image into memory (using the Openslide library), and only ~.5 s to pass them through the model.

但是，我正在开发一台具有 1.5Tb RAM 的超级计算机，其中仅使用了约 26 Gb.数据集总共约 500Gb.我的想法是，如果我们可以将整个数据集加载到内存中，它将极大地加快训练速度.但我正在与一个研究团队合作，我们正在对多个 Python 脚本进行实验.因此，理想情况下，我希望在一个脚本中将整个数据集加载到内存中，并且能够跨所有脚本访问它.

However, I'm working on a supercomputer with 1.5Tb of RAM of which only ~26 Gb is being utilized. The dataset is a total of ~500Gb. My thinking is that if we could load the entire dataset into memory it would speed up training tremendously. But I am working with a research team and we are running experiments across multiple Python scripts. So ideally, I would like to load the entire dataset into memory in one script and be able to access it across all scripts.

更多细节:

我们在单独的 Docker 容器(在同一台机器上)中运行我们的各个实验，因此数据集必须可以跨多个容器访问.
数据集是 Camelyon16 数据集；图像以 .tif 格式存储.
我们只需要读取图像，无需编写.
我们一次只需要访问数据集的一小部分.

We run our individual experiments in separate Docker containers (on the same machine), so the dataset has to be accessible across multiple containers.
The dataset is the Camelyon16 Dataset; images are stored in .tif format.
We just need to read the images, no need to write.
We only need to access small portions of the dataset at a time.

我发现了很多关于如何跨多个 Python 脚本共享内存中的 Python 对象或原始数据的帖子:

I have found many posts about how to share Python objects or raw data in memory across multiple Python scripts:

多处理模块中带有 SyncManager 和 BaseManager 的服务器进程 |示例 1 |示例 2 |文档 - 服务器进程 |文档 - SyncManagers

Server Processes with SyncManager and BaseManager in the multiprocessing module | Example 1 | Example 2 | Docs - Server Processes | Docs - SyncManagers

优点:可以通过网络由不同计算机上的进程共享(可以由多个容器共享吗?)
可能的问题:根据文档，比使用共享内存慢.如果我们使用客户端/服务器在多个容器之间共享内存，这会比从磁盘读取的所有脚本更快吗?
可能的问题:根据这个答案，Manager 对象在发送对象之前会先处理对象，这可能会减慢速度.

Positives: Can be shared by processes on different computers over a network (can it be shared by multiple containers?)
Possible issue: slower than using shared memory, according to the docs. If we share memory across multiple containers using a client/server, will that be any faster than all of the scripts reading from disk?
Possible issue: according to this answer, the Manager object pickles objects before sending them, which could slow things down.

mmap 模块 |文档

可能的问题:mmap 将文件映射到虚拟内存，而不是物理内存 - 它会创建一个临时文件.
可能的问题:因为我们一次只使用一小部分数据集，虚拟内存将整个数据集放在磁盘上，我们遇到了颠簸问题和程序日志.

Possible issue: mmap maps the file to virtual memory, not physical memory - it creates a temporary file.
Possible issue: because we use only a small portion of the dataset at a time, the virtual memory puts the entire dataset on disk, we run into thrashing issues and the program slogs.

Pyro4(Python 客户端-服务器)对象) |文档

Pyro4 (client-server for Python objects) | Docs

Python 的 sysv_ipc 模块.这个演示看起来很有前景.

The sysv_ipc module for Python. This demo looks promising.

可能的问题:可能只是对内置 较低级别的曝光>多处理模块?

我还找到了此列表，用于 Python 中的 IPC/网络选项.

I also found this list of options for IPC/networking in Python.

有些讨论服务器-客户端设置，有些讨论序列化/反序列化，恐怕这比从磁盘读取需要更长的时间.我找到的答案都没有解决我的问题，即这些是否会导致 I/O 的性能改进.

Some discuss server-client setups, some discuss serialization/deserialization, which I'm afraid will take longer than just reading from disk. None of the answers I've found address my question about whether these will result in a performance improvement on I/O.

我们不仅需要跨脚本共享 Python 对象/内存；我们需要在 Docker 容器之间共享它们.

Not only do we need to share Python objects/memory across scripts; we need to share them across Docker containers.

Docker 文档解释了--ipc 标志很好.根据正在运行的文档，对我来说有意义的事情是:

The Docker documentation explains the --ipc flag pretty well. What makes sense to me according to the documentation is running:

docker run -d --ipc=shareable data-server
docker run -d --ipc=container:data-server data-client

但是，当我在单独的容器中运行我的客户端和服务器时，使用如上所述设置的 --ipc 连接，它们无法相互通信.我读过的 SO 问题(1、2, 3, 4) 不解决单独 Docker 容器中 Python 脚本之间共享内存的集成问题.

But when I run my client and server in separate containers with an --ipc connection set up as described above, they are unable to communicate with each other. The SO questions I've read (1, 2, 3, 4) don't address integration of shared memory between Python scripts in separate Docker containers.

1:这些中的任何一个都能提供比从磁盘读取更快的访问吗?认为跨进程/容器共享内存中的数据会提高性能是否合理?
2:对于跨多个 Docker 容器共享内存中的数据，哪种解决方案最合适?
3:如何将 Python 的内存共享解决方案与 docker run --ipc= 集成?(共享 IPC 命名空间是跨 Docker 容器共享内存的最佳方式吗?)
4:是否有比这些更好的解决方案来解决我们的大量 I/O 开销问题?

1: Would any of these provide faster access than reading from disk? Is it even reasonable to think that sharing data in memory across processes/containers would improve performance?
2: Which would be most appropriate solution for sharing data in memory across multiple docker containers?
3: How to integrate memory-sharing solutions from Python with docker run --ipc=<mode>? (is a shared IPC namespace even the best way to share memory across docker containers?)
4: Is there a better solution than these to fix our problem of large I/O overhead?

这是我在不同容器中的 Python 脚本之间共享内存的幼稚方法.当 Python 脚本在同一个容器中运行时它有效，但当它们在不同的容器中运行时则无效.

This is my naive approach to memory sharing between Python scripts in separate containers. It works when the Python scripts are run the same container, but not when they are run in separate containers.

server.py

from multiprocessing.managers import SyncManager
import multiprocessing

patch_dict = {}

image_level = 2
image_files = ['path/to/normal_042.tif']
region_list = [(14336, 10752),
               (9408, 18368),
               (8064, 25536),
               (16128, 14336)]

def load_patch_dict():

    for i, image_file in enumerate(image_files):
        # We would load the image files here. As a placeholder, we just add `1` to the dict
        patches = 1
        patch_dict.update({'image_{}'.format(i): patches})

def get_patch_dict():
    return patch_dict

class MyManager(SyncManager):
    pass

if __name__ == "__main__":
    load_patch_dict()
    port_num = 4343
    MyManager.register("patch_dict", get_patch_dict)
    manager = MyManager(("127.0.0.1", port_num), authkey=b"password")
    # Set the authkey because it doesn't set properly when we initialize MyManager
    multiprocessing.current_process().authkey = b"password"
    manager.start()
    input("Press any key to kill server".center(50, "-"))
    manager.shutdown

client.py

from multiprocessing.managers import SyncManager
import multiprocessing
import sys, time

class MyManager(SyncManager):
    pass

MyManager.register("patch_dict")

if __name__ == "__main__":
    port_num = 4343

    manager = MyManager(("127.0.0.1", port_num), authkey=b"password")
    multiprocessing.current_process().authkey = b"password"
    manager.connect()
    patch_dict = manager.patch_dict()

    keys = list(patch_dict.keys())
    for key in keys:
        image_patches = patch_dict.get(key)
        # Do NN stuff (irrelevant)

当脚本在同一容器中运行时，这些脚本可以很好地共享图像.但是当它们在不同的容器中运行时，就像这样:

These scripts work fine for sharing the images when the scripts are run in the same container. But when they are run in separate containers, like this:

# Run the container for the server
docker run -it --name cancer-1 --rm --cpus=10 --ipc=shareable cancer-env
# Run the container for the client
docker run -it --name cancer-2 --rm --cpus=10 --ipc=container:cancer-1 cancer-env

我收到以下错误:

Traceback (most recent call last):
  File "patch_client.py", line 22, in <module>
    manager.connect()
  File "/usr/lib/python3.5/multiprocessing/managers.py", line 455, in connect
    conn = Client(self._address, authkey=self._authkey)
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 487, in Client
    c = SocketClient(address)
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 614, in SocketClient
    s.connect(address)
ConnectionRefusedError: [Errno 111] Connection refused

IPC 在单独的 Docker 容器中跨 Python 脚本共享内存 [英] IPC shared memory across Python scripts in separate Docker containers

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

IPC 在单独的 Docker 容器中跨 Python 脚本共享内存 [英] IPC shared memory across Python scripts in separate Docker containers

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭