混洗和导入保存的 numpy 文件的几行 [英] Shuffling and importing few rows of a saved numpy file

查看:83
本文介绍了混洗和导入保存的 numpy 文件的几行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我保存了 2 个 .npy 文件:

I have 2 saved .npy files:

X_train - (18873, 224, 224, 3) - 21.2GB
Y_train - (18873,) - 148KB

X_train 是猫和狗的图像(猫在第一半,狗在第二半,未打乱)并用 Y_train 映射为 0 和 1.因此 Y_train 是 [1,1,1,1,1,1,.........,0,0,0,0,0,0].

X_train is cats and dogs images (cats being in 1st half and dogs in 2nd half, unshuffled) and is mapped with Y_train as 0 and 1. Thus Y_train is [1,1,1,1,1,1,.........,0,0,0,0,0,0].

我想随机导入 X 中的 256 张图像(猫和狗的图像,接近 50-50%)及其在 Y 中的映射.由于数据很大,我无法在我的 RAM 中导入 X_train.

因此我尝试过(第一种方法):

Thus I have tried (1st approach):

import numpy as np
np.random.seed(666555)
X_train = np.load('Processed/X_train.npy', mmap_mode='r')
X = np.random.shuffle(X_train)
X = X[:256, :, :, :]
Y_train = np.load('Processed/Y_train.npy', mmap_mode='r')
Y = np.random.shuffle(Y_train)
Y = Y[:256]

这会产生以下错误:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-68-8b2a13921b8d> in <module>
      2 np.random.seed(666555)
      3 X_train = np.load('Processed/X_train.npy', mmap_mode='r')
----> 4 X = np.random.shuffle(X_train)
      5 X = X[:256, :, :, :]
      6 Y_train = np.load('Processed/Y_train.npy', mmap_mode='r')

mtrand.pyx in numpy.random.mtrand.RandomState.shuffle()

mtrand.pyx in numpy.random.mtrand.RandomState.shuffle()

ValueError: assignment destination is read-only

我也尝试过(第二种方法):

I have also tried (2nd approach):

import numpy as np
np.random.seed(666555)
X = np.memmap('Processed/X_train.npy', 'float64', shape = (256, 224, 224, 3), mode = 'c')
Y = np.memmap('Processed/Y_train.npy', 'float64', shape = (256), mode = 'c')
X = np.random.shuffle(X)
Y = np.random.shuffle(Y)
print(X)
print(Y)

输出:

None
None

在第二种方法中,我将只获取猫图像,因为 np.memmap 将仅收集第 1 个 256 个图像.那么洗牌就没用了.

In 2nd approach, I will get only cats images as np.memmap will collect only 1st 256 images. Then shuffling will be of no use.

请告诉我如何使用任何方法来做到这一点.

Please tell me how to do this with any method.

推荐答案

您的洗牌程序不正确.遵循此策略,您还将以与 Y 不同的方式对 X 进行混洗(混洗后 X 和 Y 之间不再匹配).这是一个演示示例:

your shuffelling procedure is not correct. following this strategy you are also shuffling your X in a different way from Y (there is no more match between X and Y after shuffle). here a demonstrative example:

np.random.seed(666555)
xxx = np.asarray([1,2,3,4,5,6,7,8,9])
yyy = np.asarray([1,2,3,4,5,6,7,8,9])
np.random.shuffle(xxx)
np.random.shuffle(yyy)

print((yyy == xxx).all()) # False

这里是正确的程序:

np.random.seed(666555)
xxx = np.asarray([1,2,3,4,5,6,7,8,9])
yyy = np.asarray([1,2,3,4,5,6,7,8,9])
idx = np.arange(0,len(xxx))
np.random.shuffle(idx)

print((yyy[idx] == xxx[idx]).all()) # True

通过这种方式,您还可以覆盖 None 问题

in this way you also override the None problem

这篇关于混洗和导入保存的 numpy 文件的几行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆