如何从tf.keras.preprocessing.image_dataset_from_directory()探索和修改创建的数据集? [英] How can I explore and modify the created dataset from tf.keras.preprocessing.image_dataset_from_directory()?

查看:858
本文介绍了如何从tf.keras.preprocessing.image_dataset_from_directory()探索和修改创建的数据集?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我使用该功能的方式:

Here's how I used the function:

dataset = tf.keras.preprocessing.image_dataset_from_directory(
    main_directory,
    labels='inferred',
    image_size=(299, 299),
    validation_split=0.1,
    subset='training',
    seed=123
)

我想像在示例,尤其是将其转换为pandas数据帧的部分.但是我的最低目标是检查标签及其上附加的文件数,只是检查它是否确实按预期创建了数据集(子目录是其中的图像的相应标签).

I'd like to explore the created dataset much like in this example, particularly the part where it was converted to a pandas dataframe. But my minimum goal is to check the labels and the number of files attached to it, just to check if, indeed, it created the dataset as expected (sub-directory being the corresponding label of images inside it).

为清楚起见,main_directory的设置如下:

To be clear, the main_directory is set up like this:

main_directory
- class_a
  - 000.jpg
  - ...
- class_b
  - 100.jpg
  - ...

我想看到数据集以如下方式显示其信息:

And I'd like to see the dataset display its info with something like this:

label     number of images
class_a   100
class_b   100

此外,是否可以删除数据集中的标签和相应的图像?想法是,如果相应的图像数量小于特定数量或不同的度量标准,则将其删除.当然可以通过其他方法在此功能之外完成此操作,但我想知道是否确实可行,如果可行,怎么做.

Additionally, is it possible to remove labels and corresponding images in a dataset? The idea is to drop them if the corresponding number of images is less than a certain number, or a different metric. It can be of course done outside this function through other means, but I'd like to know if it is indeed possible, and if so, how.

对于其他情况,所有这些的最终目标是训练像,其中将本地图像分为以其类命名的文件夹.如果有更好的方法可以包括不使用该功能并达到最终目的,那么同样欢迎.谢谢!

For additional context, the end goal of all of this is to train a pre-trained model like this with local images divided into folders named after their classes. If there is a better way that includes not using that function and meets this end goal, it's welcome all the same. Thanks!

推荐答案

我认为使用glob2来获取所有文件名,根据需要对其进行处理,然后创建一个简单的加载函数会更加容易.替换image_dataset_from_directory.

I think it would be much easier to use glob2 to get all your filenames, process them as you want to, then make a simple loading function that will replace image_dataset_from_directory.

获取所有文件:

files = glob2.glob('class_*\\*.jpg')

然后根据需要操作此文件名列表.

Then manipulate this list of filenames as desired.

然后,创建一个函数来加载图像:

Then, make a function to load the images:

def load(file_path):
    img = tf.io.read_file(file_path)
    img = tf.image.decode_jpeg(img, channels=3)
    img = tf.image.convert_image_dtype(img, tf.float32)
    img = tf.image.resize(img, size=(299, 299))
    label = tf.strings.split(file_path, os.sep)[0]
    label = tf.cast(tf.equal(label, 'class_a'), tf.int32)
    return img, label

然后创建用于训练的数据集:

Then create your dataset for training:

train_ds = tf.data.Dataset.from_tensor_slices(files).map(load).batch(4)

然后乘火车:

model.fit(train_ds)

这篇关于如何从tf.keras.preprocessing.image_dataset_from_directory()探索和修改创建的数据集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆