使用Python经常更新用于数字实验的存储数据 [英] Frequently Updating Stored Data for a Numerical Experiment using Python
本文介绍了使用Python经常更新用于数字实验的存储数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在运行一个需要多次迭代的数值实验.每次迭代之后,我想将数据存储在pickle文件或类似pickle的文件中,以防程序超时或数据结构被窃听.最好的进行方法是什么.这是基本代码:
I am running a numerical experiment that requires many iterations. After each iteration, I would like to store the data in a pickle file or pickle-like file in case the program times-out or a data structure becomes tapped. What is the best way to proceed. Here is the skeleton code:
data_dict = {} # maybe a dictionary is not the best choice
for j in parameters: # j = (alpha, beta, gamma) and cycle through
for k in number_of_experiments: # lots of experiments (10^4)
file = open('storage.pkl', 'ab')
data = experiment() # experiment returns some numerical value
# experiment takes ~ 1 seconds, but increase
# as parameters scale
data_dict.setdefault(j, []).append(data)
pickle.dump(data_dict, file)
file.close()
问题:
- 这里是否有更好的选择?还是其他我不知道的python库?
- 我正在使用数据字典,因为如果我需要做更多的实验来进行更改,那么它更容易编写代码,并且更加灵活.使用预分配的数组是否有巨大的优势?
- 打开和关闭文件会影响运行时间吗?我这样做是为了除了查看我已设置的文本日志之外,还可以检查进度.
感谢您的所有帮助!
推荐答案
- 假设您使用
numpy
进行数值实验,而不是 numpy. savez . - 仅在脚本运行时间过长的情况下,使其保持简单并进行优化.
- 打开和关闭文件确实会影响运行时间,但无论如何都要做好备份.
- Assuming you are using
numpy
for your numerical experiments, instead of pickle I would suggest using numpy.savez. - Keep it simple and make optimizations only if it you feel that the script runs too long.
- Opening and closing files does affect the run time, but having a backup is anyway better.
我将使用collections.defaultdict(list)
代替普通的dict
和setdefault
.
And I would use collections.defaultdict(list)
instead of plain dict
and setdefault
.
这篇关于使用Python经常更新用于数字实验的存储数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文