如何从泡菜文件中一次加载一行? [英] How to load one line at a time from a pickle file?

查看:71
本文介绍了如何从泡菜文件中一次加载一行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大型数据集:20,000 x 40,000作为一个numpy数组.我已将其保存为泡菜文件.

I have a large dataset: 20,000 x 40,000 as a numpy array. I have saved it as a pickle file.

我希望一次只读取其中的几行(例如100行),而不是将这个庞大的数据集读取到内存中.

Instead of reading this huge dataset into memory, I'd like to only read a few (say 100) rows of it at a time, for use as a minibatch.

我如何只能从泡菜文件中读取几行随机选择的行(没有替换行)?

How can I read only a few randomly-chosen (without replacement) lines from a pickle file?

推荐答案

您可以将泡菜以增量方式写入文件中,从而可以加载它们 也是递增的.

You can write pickles incrementally to a file, which allows you to load them incrementally as well.

以下面的示例为例.在这里,我们遍历列表中的项目,然后 依次泡菜.

Take the following example. Here, we iterate over the items of a list, and pickle each one in turn.

>>> import cPickle
>>> myData = [1, 2, 3]
>>> f = open('mydata.pkl', 'wb')
>>> pickler = cPickle.Pickler(f)
>>> for e in myData:
...     pickler.dump(e)
<cPickle.Pickler object at 0x7f3849818f68>
<cPickle.Pickler object at 0x7f3849818f68>
<cPickle.Pickler object at 0x7f3849818f68>
>>> f.close()

现在,我们可以反向执行相同的过程,并根据需要加载每个对象.为了 举例来说,假设我们只想要第一个项目,而不想要 想遍历整个文件.

Now we can do the same process in reverse and load each object as needed. For the purpose of example, let's say that we just want the first item and don't want to iterate over the entire file.

>>> f = open('mydata.pkl', 'rb')
>>> unpickler = cPickle.Unpickler(f)
>>> unpickler.load()
1

这时,文件流仅前进到第一个 目的.其余对象未加载,这正是您的行为 想.为了证明这一点,您可以尝试读取文件的其余部分,然后查看其余部分是否为 仍然坐在那里.

At this point, the file stream has only advanced as far as the first object. The remaining objects weren't loaded, which is exactly the behavior you want. For proof, you can try reading the rest of the file and see the rest is still sitting there.

>>> f.read()
'I2\n.I3\n.'

这篇关于如何从泡菜文件中一次加载一行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆