将目录中的图像文件作为数据集加载到 Tensorflow 中进行训练 [英] Load image files in a directory as dataset for training in Tensorflow

查看:39
本文介绍了将目录中的图像文件作为数据集加载到 Tensorflow 中进行训练的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 tensorflow 的新手,我从官方的 MNIST 示例代码开始学习 tensorflow 的逻辑.但是,我感觉不太好的一件事是,MNIST示例将原始数据集提供为一些压缩文件,其格式对初学者来说不清楚.这种情况也适用于 Cifar10,它以二进制文件的形式提供数据集.我觉得在实际的深度学习任务中,我们的数据集可能是很多图片文件,比如一个目录下的*.jpg或者*.png,我们还有一个文本记录每个文件的标签的文件(如 ImageNet 数据集).让我以 MNIST 为例.

I am a newbie for tensorflow, and I'm starting with the offical MNIST example code to learn the logic of tensorflow. However, one thing I felt not good is that, the MNIST example provides the original dataset as some compressed files, whose format is not clear to beginners. This case also goes with Cifar10 which provides the dataset as a binary file. I think in practical deep learning task, our dataset may be lots of image files, such as *.jpg or *.png in a directory, and we also have a text file recording the label of each file (like ImageNet dataset). Let me use MNIST as an example.

MNIST 包含 50k 个 28 x 28 大小的训练图像.现在让我们假设这些图像是 jpg 格式,并存储在目录 ./dataset/ 中.在./dataset/中,我们有一个文本文件label.txt存储每张图片的标签:

MNIST contains 50k training images of size 28 x 28. Now let's assume these images are in jpg format, and stored in a directory ./dataset/. In ./dataset/, we have a text file label.txt storing the label of each image:

/path/to/dataset/
                 image00001.jpg
                 image00002.jpg
                 ... ... ... ...
                 image50000.jpg
                 label.txt

其中 label.txt 是这样的:

#label.txt:
image00001.jpg 1
image00002.jpg 0
image00003.jpg 4
image00004.jpg 9
... ... ... ... 
image50000.jpg 3

现在我想使用 Tensorflow 用这些数据集训练单层模型.谁能帮忙提供一个简单的代码片段来做到这一点?

Now I would like to use Tensorflow to train a single layer model with these dataset. Could anyone help to give a simple code snippet to do that?

推荐答案

基本上你需要做两件事.第一个是像这样的普通 python 代码:

There's basically two things you'd need. The first is normal python code like so:

import numpy as np
from scipy import misc # feel free to use another image loader

def create_batches(batch_size):
  images = []
  for img in list_of_images:
    images.append(misc.imread(img))
  images = np.asarray(images)

  #do something similar for the labels

  while (True):
    for i in range(0,total,batch_size):
      yield(images[i:i+batch_size],labels[i:i+batch_size])

现在是 tensorflow 部分

now comes the tensorflow part

imgs = tf.placeholder(tf.float32,shape=[None,height,width,colors])
lbls = tf.placeholder(tf.int32, shape=[None,label_dimension])

with tf.Session() as sess:
#define rest of graph here
# convolutions or linear layers and cost function etc.


  batch_generator = create_batches(batch_size)
  for i in range(number_of_epochs):
    images, labels = batch_generator.next()
    loss_value = sess.run([loss], feed_dict={imgs:images, lbls:labels})

这篇关于将目录中的图像文件作为数据集加载到 Tensorflow 中进行训练的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆