内存不足将图像文件转换为numpy数组 [英] Out of memory converting image files to numpy array
问题描述
我正在尝试运行一个循环访问图像文件夹并返回两个numpy数组的循环: x -将图像存储为numpy数组 y -存储标签.
I'm trying to run a loop that iterates through an image folder and returns two numpy arrays: x - stores the image as a numpy array y - stores the label.
一个文件夹可以轻松拥有超过40.000 rgb图像,尺寸为(224,224). 我大约有12Gb的内存,但是经过一些迭代后,使用的内存会突然增加,并且一切都停止了.
A folder can easily have over 40.000 rgb images, with dimensions (224,224). I have around 12Gb of memory but after some iterations, the used memory just spikes up and everything stops.
我该怎么做才能解决此问题?
What can I do to fix this issue?
def create_set(path, quality):
x_file = glob.glob(path + '*')
x = []
for i, img in enumerate(x_file):
image = cv2.imread(img, cv2.IMREAD_COLOR)
x.append(np.asarray(image))
if i % 50 == 0:
print('{} - {} images processed'.format(path, i))
x = np.asarray(x)
x = x/255
y = np.zeros((x.shape[0], 2))
if quality == 0:
y[:,0] = 1
else:
y[:,1] = 1
return x, y
推荐答案
您只是不能将那么多图像加载到内存中.您正在尝试通过将它们添加到x来将给定路径中的每个文件加载到内存中.
You just can't load that many images into memory. You're trying to load every file in a given path to memory, by appending them to x.
尝试分批处理它们,或者如果您是针对张量流应用程序这样做,请尝试将它们首先写入.tfrecords.
Try processing them in batches, or if you're doing this for a tensorflow application try writing them to .tfrecords first.
如果要节省一些内存,请将图像保留为np.uint8而不是将其强制浮动(当您在此行中将其标准化时自动发生> c0>)
If you want to save some memory, leave the images as np.uint8 rather than casting them to float (which happens automatically when you normalise them in this line > x = x/255
)
您也不需要在x.append(np.asarray(image))
行中使用np.asarray
. image
已经是一个数组. np.asarray
用于将列表,元组等转换为数组.
You also don't need np.asarray
in your x.append(np.asarray(image))
line. image
is already an array. np.asarray
is for converting lists, tuples, etc to arrays.
一个非常粗糙的批处理示例:
a very rough batching example:
def batching function(imlist, batchsize):
ims = []
batch = imlist[:batchsize]
for image in batch:
ims.append(image)
other_processing()
new_imlist = imlist[batchsize:]
return x, new_imlist
def main():
imlist = all_the_globbing_here()
for i in range(total_files/batch_size):
ims, imlist = batching_function(imlist, batchsize)
process_images(ims)
这篇关于内存不足将图像文件转换为numpy数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!