python pickle-转储非常大的列表 [英] python pickle - dumping a very huge list

查看:99
本文介绍了python pickle-转储非常大的列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个目录,每个目录包含约50,000张图像,大多数尺寸为240x180.

I have two directories, each of which contains about 50,000 images, which are mostly 240x180 sizes.

我想腌制他们的像素信息作为训练,验证和测试集,

I want to pickle their pixel infos as training, validation, and test sets,

但这显然非常大,最终导致计算机释放或磁盘空间不足.

but this apparently turns out to be very, very large, and eventually cause the computer to either free or run out of disk spaces.

计算机冻结时,正在生成的pkl文件为28GB.

When the computer froze, the pkl file in the middle of being generated was 28GB.

我不确定是否应该这么大.

I'm not sure if this is supposed to be this large.

我做错什么了吗?还是有一种更有效的方法来做到这一点?

Am I doing something wrong? Or is there a more efficient way to do this?

from PIL import Image
import pickle
import os

indir1 = 'Positive'
indir2 = 'Negative'

trainimage = []
trainpixels = []
trainlabels = []
validimage = []
validpixels = []
validlabels = []
testimage = []
testpixels = []
testlabels = []


i=0
for (root, dirs, filenames) in os.walk(indir1):
    print 'hello'
    for f in filenames:
        try:
            im = Image.open(os.path.join(root,f))
            if i<40000:
                trainpixels.append(im.tostring())
                trainlabels.append(0)
            elif i<45000:
                validpixels.append(im.tostring())
                validlabels.append(0)
            else:
                testpixels.append(im.tostring())
                testlabels.append(0)
            print str(i)+'\t'+str(f)
            i+=1
        except IOError:
            continue

i=0
for (root, dirs, filenames) in os.walk(indir2):
print 'hello'
    for f in filenames:
        try:
            im = Image.open(os.path.join(root,f))
            if i<40000:
                trainpixels.append(im.tostring())
                trainlabels.append(1)
            elif i<45000:
                validpixels.append(im.tostring())
                validlabels.append(1)
            else:
                testpixels.append(im.tostring())
                testlabels.append(1)
            print str(i)+'\t'+str(f)
            i+=1
        except IOError:
            continue

trainimage.append(trainpixels)
trainimage.append(trainlabels)
validimage.append(validpixels)
validimage.append(validlabels)
testimage.append(testpixels)
testimage.append(testlabels)

output=open('data.pkl','wb')

pickle.dump(trainimage,output)
pickle.dump(validimage,output)
pickle.dump(testimage,output)

推荐答案

pickle文件格式并不是特别有效,尤其是对于图像而言.即使您将像素存储为每个像素1个字节,您也可以

The pickle file format isn't particularly efficient, especially not for images. Even if your pixels were stored as 1 byte per pixel, you would have

50,000× 240次180= 2,160,000,000

50,000 × 240 × 180 = 2,160,000,000

所以2 GB.您的像素无疑会占用更多的空间,我不确定PIL tostring()方法在图像上的实际作用.您产生的文件可能在数十GB内,这完全是合理的.

so 2 GB. Your pixels undoubtedly take more space than that, I'm not sure what the PIL tostring() method actually does on an image. It's entirely plausible that your resulting file could be in the tens of gigabytes.

您可能要考虑泡菜以外的其他存储方法.例如,简单地将文件以其本机映像格式存储在磁盘上,然后对文件名列表进行酸洗会是怎么回事?

You may want to consider a storage method other than pickle. For example, what would be wrong with simply storing the files on disk in their native image format, and pickling a list of the file names?

这篇关于python pickle-转储非常大的列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆