cPickle大量数据 [英] cPickle very large amount of data

查看:82
本文介绍了cPickle大量数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我大约有80万张256x256 RGB图像,总计超过7GB.

I have about 0.8 million images of 256x256 in RGB, which amount to over 7GB.

我想将它们用作卷积神经网络中的训练数据,并希望将它们及其标签放入cPickle文件中.

I want to use them as training data in a Convolutional Neural Network, and want to put them in a cPickle file, along with their labels.

现在,这需要占用大量内存,以至于它需要与我的硬盘驱动器内存交换,并几乎消耗掉所有内存.

Now, this is taking a lot of memory, to the extent that it needs to swap with my hard drive memory, and almost consume it all.

这是个坏主意吗?

在不引起过多内存问题的情况下,将其加载到CNN或对其进行腌制的更聪明/更实用的方法是什么?

What would be the smarter/more practical way to load into CNN or pickle them without causing too much memory issue?

这是代码的样子

import numpy as np
import cPickle
from PIL import Image
import sys,os

pixels = []
labels = []
traindata = []
data=[]


for subdir, dirs, files in os.walk('images'):
        curdir=''
        for file in files:
                if file.endswith(".jpg"):
                        floc=str(subdir)+'/'+str(file)
                        im= Image.open(floc)
                        pix=np.array(im.getdata())
                        pixels.append(pix)
                        labels.append(1)
pixels=np.array(pixels)
labels=np.array(labels)
traindata.append(pixels)
traindata.append(labels)
traindata=np.array(traindata)
.....# do the same for validation and test data
.....# put all data and labels into 'data' array
cPickle.dump(data,open('data.pkl','wb'))

推荐答案

这是个坏主意吗?

Is this is a bad idea?

是的.

您正在尝试一次将7GB的压缩图像数据全部加载到内存中(对于800k 256 * 256 RGB文件,大约为195 GB).这是行不通的.您必须找到一种方法来逐张更新CNN,并在进行时保存状态.

You are trying to load 7GB of compressed image data into memory all at once (about 195 GB for 800k 256*256 RGB files). This will not work. You have to find a way to update your CNN image-by-image, saving the state as you go along.

还要考虑您的CCN参数集的大小.泡菜不是为大量数据而设计的.如果您需要存储GB的神经网络数据,那么使用数据库会更好.如果神经网络参数集只有几MB,那么泡菜就可以了.

Also consider how large your CCN parameter set will be. Pickle is not designed for large amounts of data. If you need to store GB worth of neural net data, you're much better off using a database. If the neural net parameter set is only a few MB, pickle will be fine, though.

您可能还想看看 ,因此您不会受制于未优化的旧的pickle文件格式.

You might also want to take a look at the documentation for pickle.HIGHEST_PROTOCOL, so you are not stuck with an old and unoptimized pickle file format.

这篇关于cPickle大量数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆