将大型NumPy数组写入文件的有效方法 [英] Efficient ways to write a large NumPy array to a file

查看:97
本文介绍了将大型NumPy数组写入文件的有效方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前在PiCloud上运行了一个项目,该项目涉及ODE解算器的多次迭代.每次迭代都会生成一个约30行和1500列的NumPy数组,每次迭代都将附加到先前结果数组的底部.

I've currently got a project running on PiCloud that involves multiple iterations of an ODE Solver. Each iteration produces a NumPy array of about 30 rows and 1500 columns, with each iterations being appended to the bottom of the array of the previous results.

通常,我只是让函数返回这些相当大的数组,将它们保存在内存中并全部处理. PiCloud对函数可以输出和返回的数据大小有相当严格的限制,以降低传输成本.很好,除了那意味着我必须启动成千上万的作业,每个作业都是在迭代中运行,并且开销很大.

Normally, I'd just let these fairly big arrays be returned by the function, hold them in memory and deal with them all at one. Except PiCloud has a fairly restrictive cap on the size of the data that can be out and out returned by a function, to keep down on transmission costs. Which is fine, except that means I'd have to launch thousands of jobs, each running on iteration, with considerable overhead.

看来,最好的解决方案是将输出写入文件,然后使用他们具有的没有传输限制的另一个功能来收集文件.

It appears the best solution to this is to write the output to a file, and then collect the file using another function they have that doesn't have a transfer limit.

我最好的选择就是将其转储到CSV文件中吗?我应该在每次迭代时将其添加到CSV文件中,还是将其全部保留在数组中直到结束,然后只写一次?我缺少什么聪明的东西吗?

Is my best bet to do this just dumping it into a CSV file? Should I add to the CSV file each iteration, or hold it all in an array until the end and then just write once? Is there something terribly clever I'm missing?

推荐答案

除非中间文件有人类可读的原因,否则请不要使用CSV,因为这不可避免地会涉及到精度下降.

Unless there is a reason for the intermediate files to be human-readable, do not use CSV, as this will inevitably involve a loss of precision.

效率最高的可能是tofile( doc ),该功能旨在在您提前知道数据的所有属性时将文件快速转储到磁盘.

The most efficient is probably tofile (doc) which is intended for quick dumps of file to disk when you know all of the attributes of the data ahead of time.

对于独立于平台但特定于numpy的保存,可以使用save(

For platform-independent, but numpy-specific, saves, you can use save (doc).

如果需要便携性,Numpy和scipy还支持多种科学数据格式,例如HDF5.

Numpy and scipy also have support for various scientific data formats like HDF5 if you need portability.

这篇关于将大型NumPy数组写入文件的有效方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆