内存不足将图像文件转换为numpy数组 [英] Out of memory converting image files to numpy array

查看:188
本文介绍了内存不足将图像文件转换为numpy数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试运行一个循环访问图像文件夹并返回两个numpy数组的循环: x -将图像存储为numpy数组 y -存储标签.

I'm trying to run a loop that iterates through an image folder and returns two numpy arrays: x - stores the image as a numpy array y - stores the label.

一个文件夹可以轻松拥有超过40.000 rgb图像,尺寸为(224,224). 我大约有12Gb的内存,但是经过一些迭代后,使用的内存会突然增加,并且一切都停止了.

A folder can easily have over 40.000 rgb images, with dimensions (224,224). I have around 12Gb of memory but after some iterations, the used memory just spikes up and everything stops.

我该怎么做才能解决此问题?

What can I do to fix this issue?

def create_set(path, quality):
    x_file = glob.glob(path + '*')
    x = []

    for i, img in enumerate(x_file):
        image = cv2.imread(img, cv2.IMREAD_COLOR)
        x.append(np.asarray(image))
        if i % 50 == 0:
            print('{} - {} images processed'.format(path, i))

    x = np.asarray(x)
    x = x/255

    y = np.zeros((x.shape[0], 2))
    if quality == 0:
        y[:,0] = 1
    else:
        y[:,1] = 1 

    return x, y

推荐答案

您只是不能将那么多图像加载到内存中.您正在尝试通过将它们添加到x来将给定路径中的每个文件加载到内存中.

You just can't load that many images into memory. You're trying to load every file in a given path to memory, by appending them to x.

尝试分批处理它们,或者如果您是针对张量流应用程序这样做,请尝试将它们首先写入.tfrecords.

Try processing them in batches, or if you're doing this for a tensorflow application try writing them to .tfrecords first.

如果要节省一些内存,请将图像保留为np.uint8而不是将其强制浮动(当您在此行中将其标准化时自动发生> c0>)

If you want to save some memory, leave the images as np.uint8 rather than casting them to float (which happens automatically when you normalise them in this line > x = x/255)

您也不需要在x.append(np.asarray(image))行中使用np.asarray. image已经是一个数组. np.asarray用于将列表,元组等转换为数组.

You also don't need np.asarray in your x.append(np.asarray(image)) line. image is already an array. np.asarray is for converting lists, tuples, etc to arrays.

一个非常粗糙的批处理示例:

a very rough batching example:

def batching function(imlist, batchsize):
    ims = []
    batch = imlist[:batchsize]

    for image in batch:
        ims.append(image)
        other_processing()

    new_imlist = imlist[batchsize:]
    return x, new_imlist

def main():
    imlist = all_the_globbing_here()
    for i in range(total_files/batch_size):
        ims, imlist = batching_function(imlist, batchsize)
        process_images(ims)

这篇关于内存不足将图像文件转换为numpy数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆