在python3中的多进程之间共享python对象 [英] share python object between multiprocess in python3

查看:1588
本文介绍了在python3中的多进程之间共享python对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在这里我创建了一个生产者-客户程序,父进程(生产者)创建了许多子进程(消费者),然后父进程读取文件并将数据传递给子进程.

Here I create a producer-customer program,the parent process(producer) create many child process(consumer),then parent process read file and pass data to child process.

但是,这是一个性能问题,进程间传递消息花费了太多时间(我认为).

but , here comes a performance problem,pass message between process cost too much time (I think).

例如,一个 200MB 原始数据,读取和预处理父进程所需的时间少于 8 秒,而不仅仅是将数据通过多进程传递给子进程. > pipe 将花费另外的 8 秒,而子进程完成剩余工作仅需花费 3〜4 秒.

for an example ,a 200MB original data ,parent process read and pretreat will cost less then 8 seconds , than just pass data to child process by multiprocess.pipe will cost another 8 seconds , and child processes do the remain work just cost another 3 ~ 4 seconds.

因此,一个完整的工作流花费不到18秒,并且进程之间的通信花费了40%以上的时间,它比我以前考虑的要大得多,因此我尝试了多进程.队列经理,他们更糟.

so ,a complete work flow cost less than 18 seconds ,and more than 40% time cost on communication between process , it is much bigger than I used think about ,and I tried multiprocess.Queue and Manager ,they are worse.

我使用Windows7/Python3.4. 我在Google住了好几天,而POSH也许是一个很好的解决方案,但是它无法使用python3.4进行构建

I works with windows7 / Python3.4. I had google for several days , and POSH maybe a good solution , but it can't build with python3.4

我有3种方法:

1.有什么方法可以在Python3.4的进程之间直接共享python对象?作为POSH

2.是否可以将对象的指针"传递给子进程,并且子进程可以将指针"恢复到python对象?

3.multiprocess.Array可能是一个有效的解决方案,但是如果我想共享复杂的数据结构(例如列表),它是如何工作的?我应该基于它创建一个新的类并提供列表的接口吗?

我尝试了第三种方法,但效果更差.
我定义了这些值:

I tried the 3rd way,but it works worse.
I defined those value:

p_pos = multiprocessing.Value('i') #producer write position  
c_pos = multiprocessing.Value('i') #customer read position  
databuff = multiprocess.Array('c',buff_len) # shared buffer

和两个功能:

send_data(msg)  
get_data()

send_data 函数(父进程)中,它将msg复制到databuff,并通过管道将开始和结束位置(两个整数)发送到子进程.
比在 get_data 函数(子进程)中,它接收到两个位置并从databuff复制味精.

in send_data function(parent process),it copies msg to databuff , and send the start and end position (two integer)to child process via pipe.
than in get_data function (child process) ,it received the two position and copy the msg from databuff.

最后,它的成本是使用管道@ _ @

in final,it cost twice than just use pipe @_@


是的,我尝试了Cython,结果看起来不错.
我只是将python脚本的后缀更改为.pyx并对其进行编译,程序速度提高了15%.
毫无疑问,我遇到了无法找到vcvarsall.bat"和系统找不到指定的文件"错误,我花了一整天的时间解决了第一个,而第二个则阻塞了. 最后,我发现了 Cyther ,并且遇到了所有麻烦不见了^ _ ^.

Edit 2:
Yes , I tried Cython ,and the result looks good.
I just changed my python script's suffix to .pyx and compile it ,and the program speed up for 15%.
No doubt , I met the " Unable to find vcvarsall.bat" and " The system cannot find the file specified" error , and I cost whole day for solved the first one , and blocked by the second one.
Finally , I found Cyther , and all troubles gone ^_^.

推荐答案

五个月前我在您家中.我环顾了几次,但结论是使用Python进行多处理确实有您所描述的问题:

I was at your place five month ago. I looked around few times but my conclusion is multiprocessing with Python has exactly the problem you describe :

  • Pipes和Queue很好,但根据我的经验,不是适合大型物体
  • Manager()代理对象很慢,除了数组和那些对象是有限的.如果要共享复杂的数据结构,请像在此一样使用命名空间: https://docs .python.org/3.6/library/multiprocessing.html
  • Python中没有指针或实际内存管理功能,因此您无法共享选定的内存单元
  • Pipes and Queue are good but not for big objects from my experience
  • Manager() proxies objects are slow except arrays and those one are limited. if you want to share a complex data structure use a Namespace like it is done here : multiprocessing in python - sharing large object (e.g. pandas dataframe) between multiple processes
  • Manager() has a shared list you are looking for : https://docs.python.org/3.6/library/multiprocessing.html
  • There are no pointers or real memory management in Python, so you can't share selected memory cells

我通过学习C ++解决了此类问题,但这可能不是您想要阅读的内容...

I solved this kind of problem by learning C++, but it's probably not what you want to read...

这篇关于在python3中的多进程之间共享python对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆