Python:是否可以在不将其内容加载到RAM的情况下写入文件? [英] Python: Can I write to a file without loading its contents in RAM?

查看:91
本文介绍了Python:是否可以在不将其内容加载到RAM的情况下写入文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有一个我想洗牌的大数据集.整套设备无法放入RAM,因此,如果我可以同时打开多个文件(例如hdf5,numpy),按时间顺序遍历我的数据并将每个数据点随机分配给其中一个堆,则会很好(然后将每个数据点随机播放)桩).

我真的没有用python处理数据的经验,所以我不确定是否可以在不将其其余内容保存在RAM中的情况下写入文件(使用np.save和savez几乎没有成功)./p>

在h5py或numpy中是否可行?如果可以,我该怎么办?

解决方案

内存映射文件将满足您的需求.他们创建了一个numpy数组,该数组将数据保留在磁盘上,仅根据需要加载数据.完整的手册页位于此处.但是,使用它们的最简单方法是在对 np.load 的调用中传递参数 mmap_mode = r + mmap_mode = w + 磁盘上的文件(请参见此处).

我建议使用高级索引.如果数据位于一维数组 arr 中,则可以使用列表对其进行索引.因此, arr [[0,3,5]] 将为您提供 arr 的第0、3和5个元素.这将使选择改组的版本更加容易.由于这将覆盖数据,因此您需要以只读方式打开磁盘上的文件,并创建副本(使用 mmap_mode = w + )以将经过改组的数据放入其中.

Got a big data-set that I want to shuffle. The entire set won't fit into RAM so it would be good if I could open several files (e.g. hdf5, numpy) simultaneously, loop through my data chronologically and randomly assign each data-point to one of the piles (then afterwards shuffle each pile).

I'm really inexperienced with working with data in python so I'm not sure if it's possible to write to files without holding the rest of its contents in RAM (been using np.save and savez with little success).

Is this possible and in h5py or numpy and, if so, how could I do it?

解决方案

Memmory mapped files will allow for what you want. They create a numpy array which leaves the data on disk, only loading data as needed. The complete manual page is here. However, the easiest way to use them is by passing the argument mmap_mode=r+ or mmap_mode=w+ in the call to np.load leaves the file on disk (see here).

I'd suggest using advanced indexing. If you have data in a one dimensional array arr, you can index it using a list. So arr[ [0,3,5]] will give you the 0th, 3rd, and 5th elements of arr. That will make selecting the shuffled versions much easier. Since this will overwrite the data you'll need to open the files on disk read only, and create copies (using mmap_mode=w+) to put the shuffled data in.

这篇关于Python:是否可以在不将其内容加载到RAM的情况下写入文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆