如何使用 sklearn.datasets.load_files 加载数据百分比 [英] How to load percentage of data with sklearn.datasets.load_files

查看:25
本文介绍了如何使用 sklearn.datasets.load_files 加载数据百分比的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 sklearn 加载了 8000 张图像.datasets.load_files 并通过来自 keras 的 resnet 获得瓶颈特征.然而,这项任务在 GPU 上需要几个小时,所以我想知道是否有办法告诉 load_files 加载 20% 之类的数据百分比.

I have 8000 images which I am loading with sklearn.datasets.load_files and passing through resnet from keras to get bottleneck features. However this task is taking hours on a GPU so I'd like to find out if there is a way to tell load_files to load a percentage of data like 20%.

我这样做是为了训练我自己的顶层(最后一个密集层)并将其附加到 resnet.

I'm doing this to train my own top layer (last dense layer) and attach it to resnet.

def load_dataset(path):
    data = load_files(path)
    files = np.array(data['filenames'])
    targets = np_utils.to_categorical(np.array(data['target']), 100)
    return files, targets

train_files, train_targets = load_dataset('images/train')

推荐答案

这听起来更适合 Keras ImageDataGenerator 类并使用 ImageDataGenerator.flow_from_directory> 方法.您不必对它使用数据增强(这会进一步减慢速度),但您可以选择从目录中提取的批次大小,而不是全部加载.

This sounds like it would be better suited for the Keras ImageDataGenerator class and to use the ImageDataGenerator.flow_from_directory method. You don't have to use data augmentation with it (which would slow it down further) but you can choose your batch size to pull from the directory instead of loading them all.

复制自 https://keras.io/preprocessing/image/ 并稍加注释修改.

Copied from https://keras.io/preprocessing/image/ and slightly modified with notes.

train_datagen = ImageDataGenerator(  # <- customize your transformations
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)

test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
        'data/train',
        target_size=(150, 150),
        batch_size=32,  # <- control how many images are loaded each batch
        class_mode='binary')

validation_generator = test_datagen.flow_from_directory(
        'data/validation',
        target_size=(150, 150),
        batch_size=32,
        class_mode='binary')

model.fit_generator(
        train_generator,
        steps_per_epoch=2000,  # <- reduce here to lower the overall images used
        epochs=50,
        validation_data=validation_generator,
        validation_steps=800)

编辑

根据您在下面的问题...steps_per_epoch 确定每个 epoch 加载多少批次.

Per your question below... steps_per_epoch determines how many batches are loaded for each epoch.

例如:

  • steps_per_epoch = 50
  • batch_size = 32
  • epochs = 1

将为您提供该时代总共 1,600 张图像.这恰好是您 8,000 张图像的 20%.注意,如果您遇到批处理大小为 32 的内存问题,您可能希望减少此值并增加您的 step_per_epoch.需要一些修补才能让它正确.

Would give you 1,600 images total for that epoch. Which is exactly 20% of your 8,000 images. Note that if you run into memory problems with a batch size of 32, you may want to decrease this and increase your steps_per_epoch. It will take some tinkering with to get it right.

这篇关于如何使用 sklearn.datasets.load_files 加载数据百分比的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆