装载阵列添加模式使用numpy.save保存 [英] loading arrays saved using numpy.save in append mode

查看:2095
本文介绍了装载阵列添加模式使用numpy.save保存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我保存使用numpy.save()的附加模式阵列:

I save arrays using numpy.save() in append mode:

f = open("try.npy", 'ab')
sp.save(f,[1, 2, 3, 4, 5])
sp.save(f,[6, 7, 8, 9, 10])
f.close()

我可以再加载LIFO模式下的数据?
也就是说,如果我想现在加载6-10数组,我需要加载两次(使用B):

Can I then load the data in LIFO mode? Namely, if I wish to now load the 6-10 array, do I need to load twice (use b):

f = open("try.npy", 'r')
a = sp.load(f)
b = sp.load(f)
f.close()

或者我可以简单的负载第二附加救?

or can I straightforward load the second appended save?

推荐答案

我有点惊讶,这连续的保存和载入的作品。我不认为这是记录(请指正)。但很显然,每个保存是一个自包含的单元,而负荷读取该单元的结束,相对于文件结束。

I'm a little surprised that this sequential save and load works. I don't think it is documented (please correct me). But evidently each save is a self contained unit, and load reads to the end of that unit, as opposed to the end of the file.

想想每个负荷的readline 。不能读取只是一个文件的最后一行;你必须阅读之前,所有的人。

Think of each load as a readline. You can't read just the last line of a file; you have to read all the ones before it.

好 - 有一种阅读方式的最后一个 - 使用征求来读取移动到特定点的文件。但要做到这一点,你必须知道所需的块开始的确切位置。

Well - there is a way of reading the last - using seek to move the file read to a specific point. But to do that you have to know exactly where the desired block starts.

np.savez 是保存多个阵列到一个文件,或者更确切地说,一个zip压缩包的预期方式。

np.savez is the intended way of saving multiple arrays to a file, or rather to a zip archive.

保存节省了两部分,包含如 DTYPE 信息的固定大小头形状进步和阵列的数据缓冲区的副本。在的nbytes 属性提供的数据缓冲区的大小。至少这是对数字和字符串dtypes的情况。

save saves two parts, a fixed sized header that contains information like dtype, shape and strides, and a copy of the array's data buffer. The nbytes attribute gives the size of the data buffer. At least this is the case for numeric and string dtypes.

保存文档具有使用打开的文件的例子 - 以寻求(0)倒带的文件通过负荷

save doc has an example of using an opened file - with seek(0) to rewind the file for use by load.

np.lib.npyio.format 对保存格式的更多信息。看起来有可能通过读取其前几个字节,以确定报头的长度。你也许可以使用的功能模块中执行所有这些读取和计算。

np.lib.npyio.format has more information on the saving format. Looks it is possible to determine the length of the header by reading its first few bytes. You could probably use functions in the module to perform all these reads and calculations.

如果我读的例子整个文件,我得到:

If I read the whole file from the example, I get:

In [696]: f.read()
Out[696]: 
b"\x93NUMPY\x01\x00F\x00
{'descr': '<i4', 'fortran_order': False, 'shape': (5,), }\n
 \x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x04\x00\x00\x00\x05\x00\x00\x00
\x93NUMPY\x01\x00F\x00
{'descr': '<i4', 'fortran_order': False, 'shape': (5,), }\n
 \x06\x00\x00\x00\x07\x00\x00\x00\x08\x00\x00\x00\t\x00\x00\x00\n\x00\x00\x00"

我添加换行符突出这一文件的不同部分。请注意,每个保存启动\\ x93NUMPY

通过打开的文件˚F,我可以读头(或第一个数组)用:

With an open file f, I can read the header (or the first array) with:

In [707]: np.lib.npyio.format.read_magic(f)
Out[707]: (1, 0)
In [708]: np.lib.npyio.format.read_array_header_1_0(f)
Out[708]: ((5,), False, dtype('int32'))

和我可以加载数据:

In [722]: np.fromfile(f, dtype=np.int32, count=5)
Out[722]: array([1, 2, 3, 4, 5])

我推断这从 np.lib.npyio.format.read_array 函数code。

现在该文件位于:

In [724]: f.tell()
Out[724]: 100

这是下一个阵列的头部

which is the head of the next array:

In [725]: np.lib.npyio.format.read_magic(f)
Out[725]: (1, 0)
In [726]: np.lib.npyio.format.read_array_header_1_0(f)
Out[726]: ((5,), False, dtype('int32'))
In [727]: np.fromfile(f, dtype=np.int32, count=5)
Out[727]: array([ 6,  7,  8,  9, 10])

和我们在EOF。

和明知 INT32 有4个字节,我们可以计算出的数据占用20个字节。所以,我们可以通过读取头跳过阵列,计算所述数据块的大小,和过去它到达下一个阵列。对于工作是不值得的小数组;但对于非常大的人,也可能是有用的。

And knowing that int32 has 4 bytes, we can calculate that the data occupies 20 bytes. So we could skip over an array by reading the header, calculating the size of the data block, and seek past it to get to the next array. For small arrays that work isn't worth it; but for very large ones, it may be useful.

这篇关于装载阵列添加模式使用numpy.save保存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆