如何同步python dict与多处理 [英] How to synchronize a python dict with multiprocessing

查看:109
本文介绍了如何同步python dict与多处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Python 2.6和多线程处理模块。现在我想要一个同步的dict(其中唯一的原子操作我真正需要的是一个值上的+ =运算符)。



我应该使用multiprocessing.sharedctypes.synchronized()调用包装dict吗?还是另一种方式去?

解决方案

简介



似乎有很多的扶手椅建议,没有工作的例子。这里列出的答案都不是建议使用多重处理,这有点令人失望和令人不安。作为python爱好者,我们应该支持我们内置的图书馆,而并行处理和同步并不是一件小事,相信它可以通过正确的设计变得微不足道。这在现代多核架构中变得非常重要,不能够强调!也就是说,我对多处理库非常满意,因为它仍然处于起步阶段,有很多缺陷,错误和面向功能编程(我所憎恶的)。目前,由于多处理器的严重限制,我仍然更喜欢 Pyro 模块(这是先于其时间)在服务器运行时无法共享新创建的对象。在管理员(或其服务器)启动之前,管理对象的注册类方法只会实际注册一个对象。足够的喋喋不休,更多代码:



Server.py



 来自多处理.managers import SyncManager 


class MyManager(SyncManager):
pass


syncdict = {}
def get_dict )
return syncdict

如果__name__ ==__main__:
MyManager.register(syncdict,get_dict)
manager = MyManager((127.0。 0.1,5000),authkey =password)
manager.start()
raw_input(按任意键杀死服务器.center(50, - ))
manager .shutdown()

在上面的代码示例中,Server.py使用多处理器的SyncManager,可以提供同步的共享对象。由于多处理库对于每个注册对象如何找到可调用,这个代码在解释器中无法运行。运行Server.py将启动一个自定义的SyncManager,它共享syncdict字典以使用多个进程,并且可以在同一台机器上连接到客户端,或者运行在除了环回以外的其他机器上的IP地址上。在这种情况下,服务器在端口5000上运行环回(127.0.0.1)。使用authkey参数在操作syncdict时使用安全连接。当任何键被按下时,管理器关闭。



Client.py



  from multiprocessing.managers import SyncManager 
import sys,time

class MyManager(SyncManager):
pass

MyManager.register(syncdict )

如果__name__ ==__main__:
manager = MyManager((127.0.0.1,5000),authkey =password)
manager.connect()
syncdict = manager.syncdict()

打印dict =%s%(dir(syncdict))
key = raw_input(Enter key to update:)
inc = float(raw_input(Enter increment:))
sleep = float(raw_input(Enter sleep time(sec):))

try:
#if键不存在创建它
如果不是syncdict.has_key(键):$​​ b $ b syncdict.update([(key,0)])
#increment每个睡眠的键值秒
#then print syncdict
while True:
syncdict.update([(key,syncdict.get(key )$)
time.sleep(睡眠)
打印%s%(syncdict)
除了KeyboardInterrupt:
打印杀死客户端

客户端还必须创建一个自定义SyncManager,注册syncdict,此时不需要传递一个可调用来检索共享的dict 。然后,它使用定制的SycnManager使用端口5000上的回送IP地址(127.0.0.1)进行连接,以及authkey与Server.py中启动的管理器建立安全连接。它通过在管理器上调用已注册的可调用来检索共享的dict syncdict。它会提示用户执行以下操作:


  1. syncdict中的操作键

  2. 每个周期增加密钥访问的值

  3. 每个周期以秒为单位的睡眠时间

然后客户端检查密钥是否存在。如果它不会创建syncdict的键。然后,客户端进入一个无尽循环,通过增量更新密钥的值,睡眠指定的数量,并打印syncdict以重复此过程,直到发生KeyboardInterrupt(Ctrl + C)。



令人厌烦的问题




  1. 管理员的注册方法必须在管理员启动之前调用,否则你会得到异常虽然经理的电话呼叫将会显示它确实有注册的方法。

  2. dict的所有操作必须用方法完成,而不是使用dict命令(syncdict [blast ] = 2将因多重处理共享自定义对象的方式而失败。

  3. 使用SyncManager的dict方法将减轻烦人的问题#2,除了令人讨厌的问题#1阻止SyncManager返回的代理。 dict()被注册和共享。 (SyncManager.dict()只能在管理器启动后才能被调用,注册器只有在管理器启动之前才能工作,所以SyncManager.dict()只有在执行函数式编程并将代理传递给Processes作为参数时才有用doc示例)

  4. 服务器和客户端都必须注册,尽管直觉上看起来客户端可以在连接到管理员之后找出它(请添加这个给你的愿望清单多处理开发人员)



关闭



我希望你喜欢这个相当彻底和稍微耗时的答案,就像我一样。我在脑海里遇到了很多麻烦,为什么我在多处理模块中挣扎得很多,而Pyro让它变得轻而易举,现在感谢这个答案,我已经打到了头上。我希望这对于python社区如何改进多处理模块是有用的,因为我确实相信它有很大的希望,但在其初期阶段还没有达到可能。尽管描述了令人讨厌的问题,我认为这仍然是一个可行的选择,而且很简单。您也可以使用SyncManager.dict()并将其作为参数以文档的显示方式传递给Processes,并且根据您对于我的感觉不自然的要求,这可能是一个更简单的解决方案。


I am using Python 2.6 and the multiprocessing module for multi-threading. Now I would like to have a synchronized dict (where the only atomic operation I really need is the += operator on a value).

Should I wrap the dict with a multiprocessing.sharedctypes.synchronized() call? Or is another way the way to go?

解决方案

Intro

There seems to be a lot of arm-chair suggestions and no working examples. None of the answers listed here even suggest using multiprocessing and this is quite a bit disappointing and disturbing. As python lovers we should support our built-in libraries, and while parallel processing and synchronization is never a trivial matter, I believe it can be made trivial with proper design. This is becoming extremely important in modern multi-core architectures and cannot be stressed enough! That said, I am far from satisfied with the multiprocessing library, as it is still in its infancy stages with quite a few pitfalls, bugs, and being geared towards functional programming (which I detest). Currently I still prefer the Pyro module (which is way ahead of its time) over multiprocessing due to multiprocessing's severe limitation in being unable to share newly created objects while the server is running. The "register" class-method of the manager objects will only actually register an object BEFORE the manager (or its server) is started. Enough chatter, more code:

Server.py

from multiprocessing.managers import SyncManager


class MyManager(SyncManager):
    pass


syncdict = {}
def get_dict():
    return syncdict

if __name__ == "__main__":
    MyManager.register("syncdict", get_dict)
    manager = MyManager(("127.0.0.1", 5000), authkey="password")
    manager.start()
    raw_input("Press any key to kill server".center(50, "-"))
    manager.shutdown()

In the above code example, Server.py makes use of multiprocessing's SyncManager which can supply synchronized shared objects. This code will not work running in the interpreter because the multiprocessing library is quite touchy on how to find the "callable" for each registered object. Running Server.py will start a customized SyncManager that shares the syncdict dictionary for use of multiple processes and can be connected to clients either on the same machine, or if run on an IP address other than loopback, other machines. In this case the server is run on loopback (127.0.0.1) on port 5000. Using the authkey parameter uses secure connections when manipulating syncdict. When any key is pressed the manager is shutdown.

Client.py

from multiprocessing.managers import SyncManager
import sys, time

class MyManager(SyncManager):
    pass

MyManager.register("syncdict")

if __name__ == "__main__":
    manager = MyManager(("127.0.0.1", 5000), authkey="password")
    manager.connect()
    syncdict = manager.syncdict()

    print "dict = %s" % (dir(syncdict))
    key = raw_input("Enter key to update: ")
    inc = float(raw_input("Enter increment: "))
    sleep = float(raw_input("Enter sleep time (sec): "))

    try:
         #if the key doesn't exist create it
         if not syncdict.has_key(key):
             syncdict.update([(key, 0)])
         #increment key value every sleep seconds
         #then print syncdict
         while True:
              syncdict.update([(key, syncdict.get(key) + inc)])
              time.sleep(sleep)
              print "%s" % (syncdict)
    except KeyboardInterrupt:
         print "Killed client"

The client must also create a customized SyncManager, registering "syncdict", this time without passing in a callable to retrieve the shared dict. It then uses the customized SycnManager to connect using the loopback IP address (127.0.0.1) on port 5000 and an authkey establishing a secure connection to the manager started in Server.py. It retrieves the shared dict syncdict by calling the registered callable on the manager. It prompts the user for the following:

  1. The key in syncdict to operate on
  2. The amount to increment the value accessed by the key every cycle
  3. The amount of time to sleep per cycle in seconds

The client then checks to see if the key exists. If it doesn't it creates the key on the syncdict. The client then enters an "endless" loop where it updates the key's value by the increment, sleeps the amount specified, and prints the syncdict only to repeat this process until a KeyboardInterrupt occurs (Ctrl+C).

Annoying problems

  1. The Manager's register methods MUST be called before the manager is started otherwise you will get exceptions even though a dir call on the Manager will reveal that it indeed does have the method that was registered.
  2. All manipulations of the dict must be done with methods and not dict assignments (syncdict["blast"] = 2 will fail miserably because of the way multiprocessing shares custom objects)
  3. Using SyncManager's dict method would alleviate annoying problem #2 except that annoying problem #1 prevents the proxy returned by SyncManager.dict() being registered and shared. (SyncManager.dict() can only be called AFTER the manager is started, and register will only work BEFORE the manager is started so SyncManager.dict() is only useful when doing functional programming and passing the proxy to Processes as an argument like the doc examples do)
  4. The server AND the client both have to register even though intuitively it would seem like the client would just be able to figure it out after connecting to the manager (Please add this to your wish-list multiprocessing developers)

Closing

I hope you enjoyed this quite thorough and slightly time-consuming answer as much as I have. I was having a great deal of trouble getting straight in my mind why I was struggling so much with the multiprocessing module where Pyro makes it a breeze and now thanks to this answer I have hit the nail on the head. I hope this is useful to the python community on how to improve the multiprocessing module as I do believe it has a great deal of promise but in its infancy falls short of what is possible. Despite the annoying problems described I think this is still quite a viable alternative and is pretty simple. You could also use SyncManager.dict() and pass it to Processes as an argument the way the docs show and it would probably be an even simpler solution depending on your requirements it just feels unnatural to me.

这篇关于如何同步python dict与多处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆