使用Python经常更新用于数字实验的存储数据 [英] Frequently Updating Stored Data for a Numerical Experiment using Python

查看:156
本文介绍了使用Python经常更新用于数字实验的存储数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行一个需要多次迭代的数值实验.每次迭代之后,我想将数据存储在pickle文件或类似pickle的文件中,以防程序超时或数据结构被窃听.最好的进行方法是什么.这是基本代码:

I am running a numerical experiment that requires many iterations. After each iteration, I would like to store the data in a pickle file or pickle-like file in case the program times-out or a data structure becomes tapped. What is the best way to proceed. Here is the skeleton code:

data_dict = {}                       # maybe a dictionary is not the best choice
for j in parameters:                 # j = (alpha, beta, gamma) and cycle through
    for k in number_of_experiments:  # lots of experiments (10^4)
        file = open('storage.pkl', 'ab')
        data = experiment()          # experiment returns some numerical value
                                     # experiment takes ~ 1 seconds, but increase
                                     # as parameters scale
        data_dict.setdefault(j, []).append(data)
        pickle.dump(data_dict, file)
        file.close()

问题:

  1. 这里是否有更好的选择?还是其他我不知道的python库?
  2. 我正在使用数据字典,因为如果我需要做更多的实验来进行更改,那么它更容易编写代码,并且更加灵活.使用预分配的数组是否有巨大的优势?
  3. 打开和关闭文件会影响运行时间吗?我这样做是为了除了查看我已设置的文本日志之外,还可以检查进度.

感谢您的所有帮助!

推荐答案

  1. 假设您使用numpy进行数值实验,而不是 numpy. savez .
  2. 仅在脚本运行时间过长的情况下,使其保持简单并进行优化.
  3. 打开和关闭文件确实会影响运行时间,但无论如何都要做好备份.
  1. Assuming you are using numpy for your numerical experiments, instead of pickle I would suggest using numpy.savez.
  2. Keep it simple and make optimizations only if it you feel that the script runs too long.
  3. Opening and closing files does affect the run time, but having a backup is anyway better.

我将使用collections.defaultdict(list)代替普通的dictsetdefault.

And I would use collections.defaultdict(list) instead of plain dict and setdefault.

这篇关于使用Python经常更新用于数字实验的存储数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆