在tensorflow中加载图像文件夹 [英] Loading folders of images in tensorflow

查看:293
本文介绍了在tensorflow中加载图像文件夹的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是tensorflow的新手,但是我已经遵循并执行了他们在网络上推广的教程和许多其他教程. 我在MNIST图像上做了一个卷积神经网络.没什么特别的,但是我想对自己的图像进行测试. 现在我的问题来了:我创建了几个文件夹;每个文件夹的名称是其中图像所属的类(标签).

I'm new to tensorflow, but i already followed and executed the tutorials they promote and many others all over the web. I made a little convolutional neural network over the MNIST images. Nothing special, but i would like to test on my own images. Now my problem comes: I created several folders; the name of each folder is the class (label) the images inside belong.

图像具有不同的形状;我的意思是他们没有固定的尺寸.

The images have different shapes; i mean they have no fixed size.

如何加载它们以用于Tensorflow?

How can i load them for using with Tensorflow?

我在StackOverflow和其他Q/A网站上都遵循了许多教程和答案.但是,我仍然不知道如何做到这一点.

I followed many tutorials and answers both here on StackOverflow and on others Q/A sites. But still, i did not figure out how to do this.

推荐答案

tf.data API(从Tensorflow 1.4开始)非常适合此类情况.管道将类似于以下内容:

The tf.data API (tensorflow 1.4 onwards) is great for things like this. The pipeline will looks something like the following:

  • 创建一个初始的tf.data.Dataset对象,该对象遍历所有示例
  • (如果接受培训)shuffle/repeat数据集;
  • map通过某种功能使所有图像具有相同大小;
  • batch;
  • (可选)prefetch告诉您的程序在网络处理当前批次时收集预处理的后续批次数据;和
  • 并获取输入.
  • Create an initial tf.data.Dataset object that iterates over all examples
  • (if training) shuffle/repeat the dataset;
  • map it through some function that makes all images the same size;
  • batch;
  • (optionall) prefetch to tell your program to collect the preprocess subsequent batches of data while the network is processing the current batch; and
  • and get inputs.

创建初始数据集的方法有很多种(请参见

There are a number of ways of creating your initial dataset (see here for a more in depth answer)

支持tensorflow版本1.12及更高版本, Tensorflow数据集提供了一个相对简单的API,用于创建tfrecord数据集,并自动处理数据下载,分片,统计信息生成和其他功能.

Supporting tensorflow version 1.12 onwards, Tensorflow datasets provides a relatively straight-forward API for creating tfrecord datasets, and also handles data downloading, sharding, statistics generation and other functionality automatically.

例如参见此图像分类数据集实现.那里有很多bookeeper内容(下载url,引文等),但是技术部分归结为指定features并编写_generate_examples函数

See e.g. this image classification dataset implementation. There's a lot of bookeeping stuff in there (download urls, citations etc), but the technical part boils down to specifying features and writing a _generate_examples function

features = tfds.features.FeaturesDict({
            "image": tfds.features.Image(shape=(_TILES_SIZE,) * 2 + (3,)),
            "label": tfds.features.ClassLabel(
                names=_CLASS_NAMES),
            "filename": tfds.features.Text(),
        })

...

def _generate_examples(self, root_dir):
  root_dir = os.path.join(root_dir, _TILES_SUBDIR)
  for i, class_name in enumerate(_CLASS_NAMES):
    class_dir = os.path.join(root_dir, _class_subdir(i, class_name))
    fns = tf.io.gfile.listdir(class_dir)

    for fn in sorted(fns):
      image = _load_tif(os.path.join(class_dir, fn))
      yield {
          "image": image,
          "label": class_name,
          "filename": fn,
      }


您还可以使用较低级别的操作来生成tfrecords.

或者,您可以从tf.data.Dataset.map内部的文件名中加载图像文件,如下所示.

Alternatively you can load the image files from filenames inside tf.data.Dataset.map as below.

image_paths, labels = load_base_data(...)
epoch_size = len(image_paths)
image_paths = tf.convert_to_tensor(image_paths, dtype=tf.string)
labels = tf.convert_to_tensor(labels)

dataset = tf.data.Dataset.from_tensor_slices((image_paths, labels))

if mode == 'train':
    dataset = dataset.repeat().shuffle(epoch_size)


def map_fn(path, label):
    # path/label represent values for a single example
    image = tf.image.decode_jpeg(tf.read_file(path))

    # some mapping to constant size - be careful with distorting aspec ratios
    image = tf.image.resize_images(out_shape)
    # color normalization - just an example
    image = tf.to_float(image) * (2. / 255) - 1
    return image, label


# num_parallel_calls > 1 induces intra-batch shuffling
dataset = dataset.map(map_fn, num_parallel_calls=8)
dataset = dataset.batch(batch_size)
# try one of the following
dataset = dataset.prefetch(1)
# dataset = dataset.apply(
#            tf.contrib.data.prefetch_to_device('/gpu:0'))

images, labels = dataset.make_one_shot_iterator().get_next()

我从来没有在分布式环境中工作过,但是我从未注意到在tfrecords上使用这种方法会降低性能.如果您需要更多自定义加载功能,还请查看 tf.py_func .

I've never worked in a distributed environment, but I've never noticed a performance hit from using this approach over tfrecords. If you need more custom loading functions, also check out tf.py_func.

更多常规信息此处,以及有关性能的注释

More general information here, and notes on performance here

这篇关于在tensorflow中加载图像文件夹的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆