使用Python多处理管理器(BaseManager/SyncManager)与远程计算机共享队列时管道中断 [英] Broken Pipe when Using Python Multiprocessing Managers (BaseManager/SyncManager) to Share Queue with Remote Machines

查看:416
本文介绍了使用Python多处理管理器(BaseManager/SyncManager)与远程计算机共享队列时管道中断的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在上个月,当我们尝试使用Python 2.6.x多处理程序包在几台不同的(linux)计算机之间共享队列时,我们一直遇到一个问题.我也直接向Jesse Noller提出了这个问题,因为我们尚未在StackOverflow,Python文档,源代码或其他在线内容中找到任何可以阐明该问题的东西.

In the last month, we've had a persistent problem with the Python 2.6.x multiprocessing package when we've tried to use it to share a queue among several different (linux) computers. I've posed this question directly to Jesse Noller as well since we haven't yet found anything that elucidates the issue on StackOverflow, Python docs, source code or elsewhere online.

我们的工程师团队无法解决这一问题,我们已经向python用户组中的很多人提出了这个问题,但无济于事.我希望有人能提供一些见解,因为我觉得我们做错了什么,但距离问题太近了,根本无法看清问题所在.

Our team of engineers hasn't been able to solve this one, and we've posed the question to quite a few people in python user groups to no avail. I was hoping someone could shed some insight, since I feel like we're doing something incorrect but are too close to the problem to see it for what it is.

这是症状:

Traceback (most recent call last):
  File "/var/django_root/dev/com/brightscope/data/processes/daemons/deferredupdates/servers/queue_server.py", line 65, in get_from_queue
    return queue, queue.get(block=False)
  File "<string>", line 2, in get
  File "/usr/local/lib/python2.6/multiprocessing/managers.py", line 725, in _callmethod
    conn.send((self._id, methodname, args, kwds))
IOError: [Errno 32] Broken pipe

(我正在显示我们的代码在由扩展SyncManger的管理器托管的共享队列对象上调用queue.get()的位置.)

(I'm showing where our code calls queue.get() on a shared queue object, hosted by a manager that extends SyncManger).

这个问题的特殊之处在于,即使我们在单台机器上连接到该共享队列(我们称其为machine A),即使来自许多并发进程,我们也似乎从未遇到过问题.只有当我们从其他机器(让我们将它们称为machines B and C)连接到队列(再次使用扩展多处理SyncManager的类,并且当前不添加任何其他功能)并在队列中运行大量项目时,才将其连接到队列.我们遇到问题的同时.

What's peculiar about the issue is that if we connect to this shared queue on a single machine (let's call this machine A), even from lots of concurrent processes, we never seem to run into an issue. It's only when we connect to the queue (again, using a class that extends multiprocessing SyncManager and currently adds no additional functionality) from other machines (let's call these machines B and C) and run a high volume of items into and out of the queue at the same time that we experience a problem.

这似乎是python的多处理程序包以一种从machine A起作用的方式处理本地连接(即使它们仍使用相同的manager.connect()连接方法),但是当至少有一个同时进行远程连接时的machines B or C会出现管道损坏的错误.

It is as though python's multiprocessing package handles local connections (even though they are still using the same manager.connect() connection method) in a manner that works from machine A but when remote connections are made simultaneously from at least one of machines B or C we get a Broken pipe error.

在我的团队所做的所有阅读中,我们认为问题与锁定有关.我们认为也许我们不应该使用Queue.Queue,而是使用multiprocessing.Queue,但是我们进行了切换,问题仍然存在(我们还注意到SyncManager自己的共享Queue是Qu​​eue.Queue的一个实例).

In all the reading my team has done, we thought the problem was related to locking. We thought maybe we shouldn't use Queue.Queue, but instead multiprocessing.Queue, but we switched and the problem persisted (we also noticed that SyncManager's own shared Queue is an instance of Queue.Queue).

我们正在努力解决问题,因为它很难重现,但确实发生得相当频繁(如果每天从队列中插入和.get()插入很多项,则每天都会发生多次).

We are pulling our hair out about how to even debug the issue, since it's hard to reproduce but does happen fairly frequently (many times per day if we are inserting and .get()ing lots of items from the queue).

我们创建的方法get_from_queue尝试重试从队列中随机抽取睡眠时间〜10次,但似乎失败一次将全部失败十次(这使我相信).到管理器的register()和.connect()也许不会给服务器提供另一个套接字连接,但是我无法通过阅读文档或查看Python内部源代码来确认这一点.

The method we created get_from_queue attempts to retry acquiring the item from a queue ~10 times with randomized sleep intervals, but it seems like if it fails once, it will fail all ten times (which lead me to believe that .register() and .connect()ing to a manager perhaps doesn't give another socket connection to the server, but I couldn't confirm this either by reading the docs or looking at the Python internal source code).

任何人都可以提供关于我们所处位置或我们如何跟踪实际发生情况的任何见解吗?

Can anyone provide any insight into where we might look or how we might track what's actually happening?

在使用multiprocessing.BaseManagermultiprocessing.SyncManager的管道断开的情况下,如何开始新的连接?

How can we start a new connection in the event of a broken pipe using multiprocessing.BaseManager or multiprocessing.SyncManager?

首先如何防止管道破裂?

How can we prevent the broken pipe in the first place?

推荐答案

FYI如果其他任何人都遇到相同的错误,请与Python核心开发团队的Ask Solem和Jesse Noller进行广泛协商后,看来这实际上是当前python 2.6.x(可能是2.7+,可能是3.x)中的错误.他们正在寻找可能的解决方案,并且将来的Python版本中可能会包含一个修复程序.

FYI In case anyone else runs by this same error, after extensive consulting with Ask Solem and Jesse Noller of Python's core dev team, it looks like this is actually a bug in current python 2.6.x (and possibly 2.7+ and possibly 3.x). They are looking at possible solutions and a fix will probably be included in a future version of Python.

这篇关于使用Python多处理管理器(BaseManager/SyncManager)与远程计算机共享队列时管道中断的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆