从目录在Keras中为CNN加载图像,但在CSV文件中添加标签 [英] Loading images in Keras for CNN from directory but label in CSV file

查看:182
本文介绍了从目录在Keras中为CNN加载图像,但在CSV文件中添加标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在目录train_images = './data/images'train_labels = './data/labels.csv'

例如-在train_images中有1000张图像,例如377.jpg,17814.jpg ....,依此类推.它们对应的类将保存在另一个CSV文件中.

For example - There are 1000 images in train_images as 377.jpg,17814.jpg .... and so on. And the class they correspond to are saved in a different CSV file.

编辑-这是CSV文件中的几行-

EDIT- Here are a few rows from the CSV file -

>>
    ID          Class

0   377.jpg     MIDDLE
1   17814.jpg   YOUNG
2   21283.jpg   MIDDLE
3   16496.jpg   YOUNG
4   4487.jpg    MIDDLE

此处I.D是图像文件名,而class是与之关联的类.

Here I.D is the image file name and the class is the class it is associated to.

我本可以使用通常的

ImageDataGenerator().flow_from_directory(train_images, class_mode='binary', batch_size=64)

但是问题是标签在CSV文件中.我可以做的是使用os重命名所有文件,然后将不同的文件放在不同的目录中,然后加载它,但它看起来如此不成熟和愚蠢.

but the problem is that labels are in a CSV file. What I could do is to rename all the files using os and put different files in different directories and then load it but it looks so immature and foolish.

如何在Keras for CNN中加载数据,其中每个图像的尺寸为(h,w,c)?

How can I load data in Keras for CNN where each image is of dimension (h,w,c)?

推荐答案

下面是我的示例,该示例使用ImageDataGenerator和ImageDataGenerator的flow_from_dataframe函数,并使用Pandas读取CSV.我使用的CSV有两列:

Here's my example using ImageDataGenerator, with the flow_from_dataframe function from ImageDataGenerator, and Pandas to read the CSV. The CSV I was using had two columns:

x_col="Image"
y_col="Id"

所以第一列是文件名,例如xxxx.jpg,第二列是类.在这种情况下,由于是来自kaggle座头鲸的挑战,所以它是哪种鲸鱼.图像文件位于目录"../input/humpback-whale-identification/train/"中

So the first column is the filename e.g. xxxx.jpg, and the second column is the class. In this case, since it is from the kaggle humpback whale challenge, what kind of whale it is. The image files are in the directory "../input/humpback-whale-identification/train/"

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, 
Conv2D, Flatten, Dropout, MaxPooling2D, BatchNormalization
from tensorflow.keras.preprocessing.image import 
ImageDataGenerator
from keras import regularizers, optimizers
import os
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

因此,请使用熊猫阅读CSV:

So read the CSV using pandas:

traindf=pd.read_csv('../input/humpback-whale- 
identification/train.csv',dtype=str)

现在使用ImageDataGenerator

Now using ImageDataGenerator

datagen=ImageDataGenerator(rescale=1./255.,validation_split=0.25)
train_generator=datagen.flow_from_dataframe(
dataframe=traindf,
directory="../input/humpback-whale-identification/train/",
x_col="Image",
y_col="Id",
subset="training",
batch_size=32,
seed=42,
shuffle=True,
class_mode="categorical",
target_size=(100,100))

现在,有时CSV中的文件名/ID没有扩展名.因此,我使用以下代码将扩展名添加到 他们:

Now sometimes the filename/ID in the CSV doesn't have an extension. So in that I used the following to add extensions to them:

def append_ext(fn):
    return fn+".jpg"

traindf["Image"]=traindf["Image"].apply(append_ext)

希望对您有所帮助!这是我第一次尝试在这里回答问题:-)

Well hope that is helpful! It's my first try at answering a Q here :-)

Kaggle数据集/挑战在此处 https://www.kaggle.com/c /humpback-whale-identification

The Kaggle dataset/challenge is here https://www.kaggle.com/c/humpback-whale-identification

注意:我已经看到人们在kaggle上以各种方式执行此操作!但这似乎是最简单的!

Note: I've seen people doing this in all kinds of ways on kaggle! But this seems the easiest!

这篇关于从目录在Keras中为CNN加载图像,但在CSV文件中添加标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆