Python 3.8 shared_memory resource_tracker 在应用程序关闭时产生意外警告 [英] Python 3.8 shared_memory resource_tracker producing unexpected warnings at application close

查看:826
本文介绍了Python 3.8 shared_memory resource_tracker 在应用程序关闭时产生意外警告的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  • 我正在使用 multiprocessing.Pool,它在 1 个或多个子进程中调用一个函数来生成大量数据.
  • 工作进程创建一个 multiprocessing.shared_memory.SharedMemory 对象并使用由 shared_memory 分配的默认名称.
  • worker 将 SharedMemory 对象的字符串名称返回给主进程.
  • 在主进程中,SharedMemory 对象被链接、使用,然后取消链接 &关闭.
  • I am using a multiprocessing.Pool which calls a function in 1 or more subprocesses to produce a large chunk of data.
  • The worker process creates a multiprocessing.shared_memory.SharedMemory object and uses the default name assigned by shared_memory.
  • The worker returns the string name of the SharedMemory object to the main process.
  • In the main process the SharedMemory object is linked to, consumed, and then unlinked & closed.

在关闭时,我看到来自 resource_tracker 的警告:

At shutdown I'm seeing warnings from resource_tracker:

/usr/local/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 10 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
/usr/local/lib/python3.8/multiprocessing/resource_tracker.py:229: UserWarning: resource_tracker: '/psm_e27e5f9e': [Errno 2] No such file or directory: '/psm_e27e5f9e'
  warnings.warn('resource_tracker: %r: %s' % (name, e))
/usr/local/lib/python3.8/multiprocessing/resource_tracker.py:229: UserWarning: resource_tracker: '/psm_2cf099ac': [Errno 2] No such file or directory: '/psm_2cf099ac'
<8 more similar messages omitted>

因为我在主进程中取消了共享内存对象的链接,所以我对这里发生的事情感到困惑.我怀疑这些消息发生在子进程中(在本例中,我使用大小为 1 的进程池进行了测试).

Since I unlinked the shared memory objects in my main process I'm confused about what's happening here. I suspect these messages are occurring in the subprocess (in this example I tested with a process pool of size 1).

这是一个最低限度的可重现示例:

Here is a minimum reproducible example:

import multiprocessing
import multiprocessing.shared_memory as shared_memory

def create_shm():
    shm = shared_memory.SharedMemory(create=True, size=30000000)
    shm.close()
    return shm.name

def main():
    pool = multiprocessing.Pool(processes=4)
    tasks = [pool.apply_async(create_shm) for _ in range(200)]

    for task in tasks:
        name = task.get()
        print('Getting {}'.format(name))
        shm = shared_memory.SharedMemory(name=name, create=False)
        shm.close()
        shm.unlink()

    pool.terminate()
    pool.join()

if __name__ == '__main__':
    main()

我发现在我自己的笔记本电脑(Linux Mint 19.3)上运行该示例运行良好,但是在两台不同的服务器机器(未知的操作系统配置,但两者不同)上运行它确实会出现问题.在所有情况下,我都从 docker 容器运行代码,因此 Python/软件配置是相同的,唯一的区别是 Linux 内核/主机操作系统.

I have found that running that example on my own laptop (Linux Mint 19.3) it runs fine, however running it on two different server machines (unknown OS configurations, but both different) it does exhibit the problem. In all cases I'm running the code from a docker container, so Python/software config is identical, the only difference is the Linux kernel/host OS.

我注意到这个可能相关的文档:https://docs.python.org/3.8/library/multiprocessing.html#contexts-and-start-methods

I notice this documentation that might be relevant: https://docs.python.org/3.8/library/multiprocessing.html#contexts-and-start-methods

我还注意到泄露的共享内存对象"的数量.因运行而异.由于我在主进程中取消链接,然后立即退出,也许这个 resource_tracker(我认为是一个单独的进程)在主进程退出之前没有收到更新.我不太了解 resource_tracker 的作用,无法完全理解我刚刚提出的建议.

I also notice that the number of "leaked shared_memory objects" varies from run to run. Since I unlink in main process, then immediately exit, perhaps this resource_tracker (which I think is a separate process) has just not received an update before the main process exits. I don't understand the role of the resource_tracker well enough to fully understand what I just proposed though.

相关主题:

推荐答案

理论上和基于 SharedMemory 的当前实现,应该会出现警告.主要原因是您创建的每个共享内存对象都被跟踪两次:第一,当它由 Pool 对象中的一个进程生成时;其次,当它被主进程消耗时.这主要是因为SharedMemory的构造函数的当前实现将register共享内存对象,而不管create参数是否设置为True 或其值为 False.

In theory and based on the current implementation of SharedMemory, the warnings should be expected. The main reason is that every shared memory object you have created is being tracked twice: first, when it's produced by one of the processes in the Pool object; and second, when it's consumed by the main process. This is mainly because the current implementation of the constructor of SharedMemory will register the shared memory object regardless of whether the createargument is set to True or its value is False.

因此,当您在主进程中调用 shm.unlink() 时,您所做的是在其生产者之前完全删除共享内存对象(Pool)开始清理它.结果,当池被销毁时,它的每个成员(如果他们有任务)都必须自行清理.关于泄漏资源的第一个警告可能指的是由中的进程实际创建的共享内存对象,这些对象从未被unlinked通过相同的流程.而且,No such file or directory 警告是由于主进程在 中的进程之前已取消链接与共享内存对象关联的文件池被破坏.

So, when you call shm.unlink() in the main process, what you are doing is deleting the shared memory object entirely before its producer (some process in the Pool) gets around to cleaning it up. As a result, when the pool gets destroyed, each of its members (if they ever got a task) has to clean up after itself. The first warning about leaked resources probably refers to the shared memory objects actually created by processes in the Pool that never got unlinked by those same processes. And, the No such file or directory warnings are due to the fact that the main process has unlinked the files associated with the shared memory objects before the processes in the Pool are destroyed.

链接的错误报告中提供的解决方案可能会阻止消耗进程产生额外的资源跟踪器,但它并不能完全防止在消费进程决定删除不是它创建的共享内存对象时出现的问题.这是因为产生共享内存对象的进程在退出或被销毁之前仍然需要做一些清理,即一些unlinking.

The solution provided in the linked bug report would likely prevent consuming processes from having to spawn additional resource trackers, but it does not quite prevent the issue that arises when a consuming process decides to delete a shared memory object that it did not create. This is because the process that produced the shared memory object will still have to do some clean up, i.e. some unlinking, before it exits or is destroyed.

您没有看到这些警告的事实令人费解.但这很可能与 OS 调度、子进程中未刷新的缓冲区以及创建进程池时使用的 start 方法的组合有关.

The fact that you are not seeing those warnings is quite puzzling. But it may well have to do with a combination of OS scheduling, unflushed buffers in the child process and the start method used when creating a process pool.

为了比较,当我在我的机器上使用 fork 作为启动方法时,我收到警告.否则,当使用 spawnforkserver 时,我看不到警告.我在您的代码中添加了参数解析,以便于测试不同的启动方法:

For comparison, when I use fork as a start method on my machine, I get the warnings. Otherwise, I see no warnings when spawn and forkserver are used. I added argument parsing to your code to make it easy to test different start methods:

#!/usr/bin/env python3
# shm_test_script.py
"""
Use --start_method or -s to pick a process start method when creating a process Pool.
Use --tasks or -t to control how many shared memory objects should be created.
Use --pool_size or -p to control the number of child processes in the create pool.
"""
import argparse
import multiprocessing
import multiprocessing.shared_memory as shared_memory


def create_shm():
    shm = shared_memory.SharedMemory(create=True, size=30000000)
    shm.close()
    return shm.name


def main(tasks, start_method, pool_size):
    multiprocessing.set_start_method(start_method, force=True)
    pool = multiprocessing.Pool(processes=pool_size)
    tasks = [pool.apply_async(create_shm) for _ in range(tasks)]

    for task in tasks:
        name = task.get()
        print('Getting {}'.format(name))
        shm = shared_memory.SharedMemory(name=name, create=False)
        shm.close()
        shm.unlink()
    pool.terminate()
    pool.join()


if __name__ == '__main__':
    parser = argparse.ArgumentParser(
        description=__doc__,
        formatter_class=argparse.RawDescriptionHelpFormatter
    )
    parser.add_argument(
        '--start_method', '-s',
        help='The multiproccessing start method to use. Default: %(default)s',
        default=multiprocessing.get_start_method(),
        choices=multiprocessing.get_all_start_methods()
    )
    parser.add_argument(
        '--pool_size', '-p',
        help='The number of processes in the pool. Default: %(default)s',
        type=int,
        default=multiprocessing.cpu_count()
    )
    parser.add_argument(
        '--tasks', '-t',
        help='Number of shared memory objects to create. Default: %(default)s',
        default=200,
        type=int
    )
    args = parser.parse_args()
    main(args.tasks, args.start_method, args.pool_size)

鉴于 fork 是最终显示警告的唯一方法(至少对我而言),也许以下关于它的声明实际上有一些内容:

Given that fork is the only method that ends up displaying the warnings (for me, at least), maybe there is actually something to the following statement about it:

父进程使用 os.fork() fork Python 解释器.这子进程在开始时实际上与父进程相同过程.父级的所有资源都由子级继承过程.请注意,安全分叉多线程进程是有问题.

The parent process uses os.fork() to fork the Python interpreter. The child process, when it begins, is effectively identical to the parent process. All resources of the parent are inherited by the child process. Note that safely forking a multithreaded process is problematic.

如果父进程的所有资源都被子进程继承,来自子进程的警告持续/传播也就不足为奇了.

It's not surprising that the warnings from child processes persist/propagate if all resources of the parent are inherited by the child processes.

如果您特别喜欢冒险,可以编辑multiprocessing/resource_tracker.py 并通过将 os.getpid() 添加到打印的字符串来更新 warnings.warn 行.例如,将带有 "resource_tracker:" 的任何警告更改为 "resource_tracker %d:"% (os.getpid()) 应该足够了.如果您这样做了,您会注意到警告来自各种进程,这些进程既不是子进程,也不是主进程本身.

If you're feeling particularly adventurous, you can edit the multiprocessing/resource_tracker.py and update warnings.warn lines by adding os.getpid() to the printed strings. For instance, changing any warning with "resource_tracker:" to "resource_tracker %d: " % (os.getpid()) should be sufficient. If you've done this, you will notice that the warnings come from various processes that are neither the child processes, nor the main process itself.

进行这些更改后,以下内容应该有助于仔细检查抱怨的资源跟踪器是否与您的 Pool 大小一样多,并且它们的进程 ID 与主进程或子进程不同:

With those changes made, the following should help with double checking that the complaining resource trackers are as many as your Pool size, and their process IDs are different from the main process or the child processes:

chmod +x shm_test_script.py
./shm_test_script.py -p 10 -t 50 -s fork > log 2> err
awk -F ':' 'length($4) > 1 { print $4 }' err | sort | uniq -c

那应该显示十行,每行前面都带有来自相应资源跟踪器的投诉数量.每行还应包含一个 PID,该 PID 应与主进程和子进程不同.

That should display ten lines, each of which prepended with the number of complaints from the corresponding resource tracker. Every line should also contain a PID that should be different from the main and child processes.

总结一下,如果每个子进程收到任何工作,它都应该有自己的资源跟踪器.由于您没有明确取消子进程中的共享内存对象的链接,因此当子进程被销毁时,资源可能会被清理.

To recap, each child process should have its own resource tracker if it receives any work. Since you're not explicitly unlinking the shared memory objects in the child processes, the resources will likely get cleaned up when the child processes are destroyed.

我希望这有助于回答您的一些(如果不是全部)问题.

I hope this helps answer some, if not all, of your questions.

这篇关于Python 3.8 shared_memory resource_tracker 在应用程序关闭时产生意外警告的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆