Python多处理和序列化数据 [英] Python Multiprocessing and Serializing Data

查看:134
本文介绍了Python多处理和序列化数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用多处理模块在学校计算机上运行脚本。我经常序列化数据。它可以通过以下代码汇总:

I am running a script on a school computer using the multiprocessing module. I am serializing the data frequently. It can be summarized by the code below:

import multiprocessing as mp
import time, pickle

def simulation(j):
    data = []
    for k in range(10):
        data.append(k)
        time.sleep(1)
        file = open('data%d.pkl'%j, 'wb')
        pickle.dump(data, file)
        file.close()
if __name__ == '__main__':
    processes = []
    processes.append(mp.Process(target = simulation, args = (1,) ))
    processes.append(mp.Process(target = simulation, args = (2,) ))
    for process in processes:
        process.start()
    for process in processes:
        process.join()

因此,当我实际运行我的代码以进行更多模拟以及我想要的更加密集的各种任务时,我得到了以下错误: IOError:[Errno 5]输入/输出错误通常以 file.open(...)开头或 file.close()

So when I actually run my code for many more simulations and what I imagine to be more intensive varied tasks, I get the following error: IOError: [Errno 5] Input/output error usually preceded by file.open(...) or file.close().

我的问题:


  • 我该如何解决这个问题我的脚本中出错?

  • 这个错误对于一个python新手来说意味着什么?参考文献赞赏。

关于我程序的更多说明:

Some more notes about my procedure:


  • 我没有将多进程属性守护程序设置为 True ,而是使用screen来运行脚本然后脱离。这样我也可以在不担心脚本停止的情况下断开连接。

  • 这似乎是一个相关的问题关于使用子流程模块进行打印。我没有像我说的那样明确地使用守护进程,因此不确定这是否会有所帮助。

  • 这通常在运行一天后发生,并且在不同的时间发生在不同的进程上。

  • Instead of setting the multiprocess attribute daemon to be True, I use screen to run the script and then detach. This allows me also to disconnect without worrying about my script stopping.
  • This seemed to be a related question about printing using the subprocess module. I did not explicitly use daemon as I said, so not sure if this will help.
  • This usually happens after running for about a day and occurs on different processes at different times.

推荐答案

你的程序看起来很不错。在这种情况下, IOError 只是意味着发生了坏事。整个模拟数据集对于Python进程来说变得很大,所以它退出了神秘的消息。

Your program looks pretty good. In this case IOError just means "bad things happened." The entire set of simulated data became to large for the Python process, so it exited with the mysterious message.

以下版本中的一些改进:

A couple improvements in the following version:


  • 一旦生成了一些数据,追加到数据文件中,然后
    从内存中删除它。该程序应该具有大致相同的RAM使用时间,而不是越来越多地使用,然后崩溃。

  • Once some data has been produced, append it to a data file, then zap it from memory. The program should have roughly the same RAM use over time, rather than using up more and more, then crashing.

方便地,如果文件是串联的 pickle 对象,我们
可以在以后轻松打印出来进行进一步检查。显示的示例代码。

Conveniently, if a file is a concatenation of pickle objects, we can easily print out each one later for further examination. Example code shown.

玩得开心!

import multiprocessing as mp
import glob, time, pickle, sys

def simulation(j):
    for k in range(10):
        datum = {'result': k}
        time.sleep(1)
        with open('data%d.pkl'%j, 'ab') as dataf:
            pickle.dump(datum, dataf)

def show():
    for datname in glob.glob('data*.pkl'):
        try:
            print '*'*8, datname
            with open(datname, 'rb') as datf:
                while True:
                    print pickle.load(datf)
        except EOFError:
            pass

def do_sim():
    processes = []
    processes.append(mp.Process(target = simulation, args = (1,) ))
    processes.append(mp.Process(target = simulation, args = (2,) ))
    for process in processes:
        process.start()
    for process in processes:
        process.join()

if __name__ == '__main__':
    if '--show' in sys.argv:
        show()
    else:
        do_sim()



python的输出。 /msim.py - 显示



output of "python ./msim.py --show"

******** data2.pkl
{'result': 0}
{'result': 1}
{'result': 2}
{'result': 3}
{'result': 4}
{'result': 5}
{'result': 6}
{'result': 7}
{'result': 8}
{'result': 9}
******** data1.pkl
{'result': 0}
{'result': 1}
{'result': 2}
{'result': 3}
{'result': 4}
{'result': 5}
{'result': 6}
{'result': 7}
{'result': 8}
{'result': 9}

这篇关于Python多处理和序列化数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆