Tensorflow 数据集 API 使图形 protobuff 文件大小加倍 [英] Tensorflow Dataset API doubles graph protobuff filesize

查看:18
本文介绍了Tensorflow 数据集 API 使图形 protobuff 文件大小加倍的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

总结:使用新的 tf.contrib.data.Dataset 使我的图形 protobuff 文件的大小加倍,我无法在 Tensorboard 中可视化图形.

Summary: Using the new tf.contrib.data.Dataset doubles the size of my graph protobuff file and I'm unable to visualize the graph in Tensorboard.

详情:

我正在试用新的 TensorFlow <代码>tf.contrib.data.Dataset 功能以及 tf.contrib.learn.Experiment 框架.我的输入数据被定义为 输入函数,它返回特征和标签的张量.

I'm trying out the new TensorFlow tf.contrib.data.Dataset functionality together with the tf.contrib.learn.Experiment framework. My input data is defined as input functions which return tensors of features and labels.

如果我使用 tf.train 创建我的输入函数.slice_input_producer 功能类似于以下代码块(完整代码 此处),那么我生成的 graph.pbtxt 文件是 620M,而 .meta 文件的大小约为 165M.

If I create my input function with the tf.train.slice_input_producer function like in the following codeblock (full code here), then my resulting graph.pbtxt file is 620M and the .meta files are around 165M in size.

def train_inputs():
    with tf.name_scope('Training_data'):
        x = tf.constant(mnist.train.images.reshape([-1, 28, 28, 1]))
        y = tf.constant(mnist.train.labels)
        sliced_input = tf.train.slice_input_producer(
            tensor_list=[x, y], shuffle=True)
        return tf.train.shuffle_batch(
            sliced_input, batch_size=batch_size,
            capacity=10000, min_after_dequeue=batch_size*10)

现在,如果我使用新的 <代码>tf.contrib.data.Dataset.from_tensor_slices 就像下面的代码块(完整代码here),然后我生成的 graph.pbtxt 文件大小翻倍至 1.3G,.meta 文件翻倍大小为330M.

Now if I create my input function with the new tf.contrib.data.Dataset.from_tensor_slices like in the following codeblock (full code here), then my resulting graph.pbtxt file doubles in size to 1.3G and the .meta files double in size to 330M.

def train_inputs():
    with tf.name_scope('Training_data'):
        images = mnist.train.images.reshape([-1, 28, 28, 1])
        labels = mnist.train.labels
        dataset = tf.contrib.data.Dataset.from_tensor_slices(
            (images, labels))
        dataset = dataset.repeat(None)  # Infinite
        dataset = dataset.shuffle(buffer_size=10000)
        dataset = dataset.batch(batch_size)
        iterator = dataset.make_one_shot_iterator()
        next_example, next_label = iterator.get_next()
        return next_example, next_label

现在因为 graph.pbtxt 文件太大,TensorBoard 需要很长时间来解析这个文件,而且我无法直观地调试我的模型图.我在 数据集文档 中发现这大小的增加来自:数组的内容将被多次复制"解决方案 将使用占位符.但是,在这种情况下,我需要将 numpy 数组输入到具有活动会话的占位符中以初始化迭代器:

Now because the graph.pbtxt file is so big TensorBoard takes ages to parse this file, and I'm unable to debug my model graph visually. I found in the Dataset documentation that this increase in size comes from: "the contents of the array will be copied multiple times" and the solution would be to use placeholders. However, in this case, I would need to feed in the numpy arrays into the placeholders with an active session to initialize the iterator:

sess.run(iterator.initializer, feed_dict={features_placeholder: features, labels_placeholder: labels})

然而,当使用 tf.contrib.learn.Experiment 框架时,这似乎超出了我的控制.

This seems, however, to be out of my control when using the tf.contrib.learn.Experiment framework.

如何使用 Experiment 框架初始化迭代器的初始化程序?或者找到一种在不增加图形大小的情况下使用 Dataset API 的解决方法?

How can I initialize the iterator's initialiser with the Experiment framework? Or find a workaround to using the Dataset API without increasing my graph size?

推荐答案

我使用 tf.train.SessionRunHook.我创建了一个 SessionRunHook 对象,它在会话创建后初始化迭代器:

I found a solution to my problem using tf.train.SessionRunHook. I create a SessionRunHook object that initialises the iterator after the session is created:

class IteratorInitializerHook(tf.train.SessionRunHook):
    def __init__(self):
        super(IteratorInitializerHook, self).__init__()
        self.iterator_initiliser_func = None

    def after_create_session(self, session, coord):
        self.iterator_initiliser_func(session)

初始化函数是在创建Dataset Iterator时设置的:

The initializer function is set when creating the Dataset Iterator:

iterator_initiliser_hook.iterator_initiliser_func = \
    lambda sess: sess.run(
        iterator.initializer,
        feed_dict={images_placeholder: images,
                   labels_placeholder: labels})

然后我将钩子对象传递给 tf.contrib.learn.Experimenttrain_monitorseval_hooks 参数.

And I pass in the hook objects to train_monitors and eval_hooks parameters of tf.contrib.learn.Experiment.

生成的 graph.pbtxt 文件现在只有 500K,而 .meta 文件只有 244K.

The resulting graph.pbtxt file is now only 500K while the .meta files are only 244K.

完整示例在此.

这篇关于Tensorflow 数据集 API 使图形 protobuff 文件大小加倍的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆