在长时间运行的python模拟中记录数据 [英] Recording data in a long running python simulation

查看:232
本文介绍了在长时间运行的python模拟中记录数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行一个仿真,需要从每个周期记录一些小的numpy数组.我当前的解决方案是加载,编写然后保存,如下所示:

I am running a simulation from which I need to record some small numpy arrays every cycle. My current solution is to load, write then save as follows:

existing_data = np.load("existing_record.npy")
updated = np.dstack((existing_data,new_array[...,None]))
np.save("existing_record.npy",updated)

这创建了一个很大的性能瓶颈,并且使用此方法进行仿真的速度仅为一半.我已经考虑过将numpy数组追加到列表中,并在模拟结束时将其写入,但这显然可能导致ram用完或在崩溃中丢失数据等.对于这种问题,是否有任何标准类型的解决方案?

This has created a big performance bottleneck and the simulation runs at half the speed with this method. I have considered appending the numpy arrays to a list and writing it at the end of the simulation but this could obviously lead to running out of ram or losing data in a crash etc. Are there any standard types of solution for this kind of problem?

推荐答案

我认为一种解决方案是通过

I think one solution is using a memory mapped file through numpy.memmap. The code can be found below. The documentation contains important information to understand the code.

import numpy as np
from os.path import getsize

from time import time

filename = "data.bin"

# Datatype used for memmap
dtype = np.int32

# Create memmap for the first time (w+). Arbitrary shape. Probably good to try and guess the correct size.
mm = np.memmap(filename, dtype=dtype, mode='w+', shape=(1, ))
print("File has {} bytes".format(getsize(filename)))


N = 20
num_data_per_loop = 10**7

# Main loop to append data
for i in range(N):

    # will extend the file because mode='r+'
    starttime = time()
    mm = np.memmap(filename,
                   dtype=dtype,
                   mode='r+',
                   offset=np.dtype(dtype).itemsize*num_data_per_loop*i,
                   shape=(num_data_per_loop, ))
    mm[:] = np.arange(start=num_data_per_loop*i, stop=num_data_per_loop*(i+1))
    mm.flush()
    endtime = time()
    print("{:3d}/{:3d} ({:6.4f} sec): File has {} bytes".format(i, N, endtime-starttime, getsize(filename)))

A = np.array(np.memmap(filename, dtype=dtype, mode='r'))
if np.array_equal(A, np.arange(num_data_per_loop*N, dtype=dtype)):
    print("Correct")

我得到的输出是:

File has 4 bytes
  0/ 20 (0.2167 sec): File has 40000000 bytes
  1/ 20 (0.2200 sec): File has 80000000 bytes
  2/ 20 (0.2131 sec): File has 120000000 bytes
  3/ 20 (0.2180 sec): File has 160000000 bytes
  4/ 20 (0.2215 sec): File has 200000000 bytes
  5/ 20 (0.2141 sec): File has 240000000 bytes
  6/ 20 (0.2187 sec): File has 280000000 bytes
  7/ 20 (0.2138 sec): File has 320000000 bytes
  8/ 20 (0.2137 sec): File has 360000000 bytes
  9/ 20 (0.2227 sec): File has 400000000 bytes
 10/ 20 (0.2168 sec): File has 440000000 bytes
 11/ 20 (0.2141 sec): File has 480000000 bytes
 12/ 20 (0.2150 sec): File has 520000000 bytes
 13/ 20 (0.2144 sec): File has 560000000 bytes
 14/ 20 (0.2190 sec): File has 600000000 bytes
 15/ 20 (0.2186 sec): File has 640000000 bytes
 16/ 20 (0.2210 sec): File has 680000000 bytes
 17/ 20 (0.2146 sec): File has 720000000 bytes
 18/ 20 (0.2178 sec): File has 760000000 bytes
 19/ 20 (0.2182 sec): File has 800000000 bytes
Correct

由于用于memmap的偏移,时间在迭代过程中大约是恒定的.另外,所需的RAM量(除了加载整个memmap进行最后的检查外)是恒定的.

The time is approximately constant over the iterations because of the offsets used for memmap. Also the amount of RAM needed (apart from loading the whole memmap for the check at the end) is constant.

我希望这可以解决您的性能问题

I hope this solves your performance issues

亲切的问候

卢卡斯

张贴者似乎已经解决了他自己的问题.我将这个答案留作选择.

Edit 1: It seems the poster has solved his own question. I leave this answer up as an alternative.

这篇关于在长时间运行的python模拟中记录数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆