Keras ImageDataGenerator等同于CSV文件 [英] Keras ImageDataGenerator equivalent for csv files

查看:124
本文介绍了Keras ImageDataGenerator等同于CSV文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在文件夹中订购了一堆数据,如下图所示:

我需要构建一个DataIterator才能使数据适合神经网络模型.我发现使用Keras类 ImageDataGenerator 及其方法 flow_from_directory 解决了当数据为图像时解决此问题的许多示例,但当数据为csv结构时却没有. /p>

每个csv文件都是一个512x11浮点数组,代表传感器所需的功率.我曾考虑过将这些CSV转换为图像格式,然后应用 ImageDataGenerator 类,但是压缩会导致信息丢失(在图像中,每个值都由8位整数表示,而我的数据是32位浮点数).

那么,在Keras中,有什么等效于 ImageDataGenerator 来加载csv文件而不是图像?

解决方案

是的,您可以通过子类化Sequence对象来编写自己的生成器.这个想法是,您用两列组成某种数据框(例如,pandas数据框):一列用于标签,另一列包含csv文件的路径.您的数据生成器将使用此文件来确定数据集的长度(csv文件的数量),并批量读取文件并将其传递给模型.

您的代码可能看起来像这样:

class DataSequence(Sequence):
    """
    Keras Sequence object to train a model on a list of csv files
    """
    def __init__(self, df, batch_size, mode='train'):
        """
        df = dataframe with two columns: the labels and a list of filenames
        """
        self.df = df
        self.bsz = batch_size
        self.mode = mode

        # Take labels and a list of image locations in memory
        self.labels = self.df['label'].values
        self.file_list = self.df['file_names']

    def __len__(self):
        return int(math.ceil(len(self.df) / float(self.bsz)))

    def on_epoch_end(self):
        self.indexes = range(len(self.im_list))
        if self.mode == 'train':
            # Shuffles indexes after each epoch if in training mode
            self.indexes = random.sample(self.indexes, k=len(self.indexes))

    def get_batch_labels(self, idx):
        # Fetch a batch of labels
        return self.labels[idx * self.bsz: (idx + 1) * self.bsz]

    def get_batch_features(self, idx):
        # Fetch a batch of inputs
        return np.array([READ_CSV_FUNCTION(f) for f in self.file_list[idx * self.bsz: (1 + idx) * self.bsz]])

    def __getitem__(self, idx):
        batch_x = self.get_batch_features(idx)
        batch_y = self.get_batch_labels(idx)
        return batch_x, batch_y

您只需要用选择的功能替换READ_CSV_FUNCTION即可读取和解析csv文件.

I have a bunch of data ordered in folders like in the following picture:

I need to build a DataIterator in order to fit the data in a Neural Network model. I have found many examples to solve this problem when the data are images, using the Keras class ImageDataGenerator and its method flow_from_directory, but not when the data is a csv structure.

Each csv file is a 512x11 float array that represents the power adquired by a sensor. I thought about transforming each of these CSVs to an image format and then applying the ImageDataGenerator class, but the compression will result in loss of information (in an image each value is represented by a 8 bits integer, while my data is a 32bits float).

So, there is an equivalent in Keras to ImageDataGenerator to load csv files instead of images?

解决方案

Yes, you can write your own generator by subclassing the Sequence object. The idea is that you compose some kind of dataframe (a pandas dataframe, for instance) with two columns: one column for the labels and on with paths to your csv files. Your datagenerator will use this file to determine the length of the dataset (number of csv files) and to read files in batches and pass them to the model.

Your code could look something like this:

class DataSequence(Sequence):
    """
    Keras Sequence object to train a model on a list of csv files
    """
    def __init__(self, df, batch_size, mode='train'):
        """
        df = dataframe with two columns: the labels and a list of filenames
        """
        self.df = df
        self.bsz = batch_size
        self.mode = mode

        # Take labels and a list of image locations in memory
        self.labels = self.df['label'].values
        self.file_list = self.df['file_names']

    def __len__(self):
        return int(math.ceil(len(self.df) / float(self.bsz)))

    def on_epoch_end(self):
        self.indexes = range(len(self.im_list))
        if self.mode == 'train':
            # Shuffles indexes after each epoch if in training mode
            self.indexes = random.sample(self.indexes, k=len(self.indexes))

    def get_batch_labels(self, idx):
        # Fetch a batch of labels
        return self.labels[idx * self.bsz: (idx + 1) * self.bsz]

    def get_batch_features(self, idx):
        # Fetch a batch of inputs
        return np.array([READ_CSV_FUNCTION(f) for f in self.file_list[idx * self.bsz: (1 + idx) * self.bsz]])

    def __getitem__(self, idx):
        batch_x = self.get_batch_features(idx)
        batch_y = self.get_batch_labels(idx)
        return batch_x, batch_y

You would just need to replace READ_CSV_FUNCTION with your function of choice to read and parse the csv files.

这篇关于Keras ImageDataGenerator等同于CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆