TensorFlow 的 Estimator 因 CPU 使用率低而冻结 [英] TensorFlow's Estimator froze with low CPU usage

查看:41
本文介绍了TensorFlow 的 Estimator 因 CPU 使用率低而冻结的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将我的 TF 更新到 v1.0rc1,并且 Estimator.evaluate 不再工作,因为它在 Restoring model... 冻结.我试图重现这个问题,下面的示例代码将使 TF 在 220% (2CPU) CPU 使用率和根本没有输出的情况下冻结.知道为什么会这样吗?谢谢!

I updated my TF to v1.0rc1, and Estimator.evaluate does not work anymore because it froze at Restoring model.... I tried to reproduce this problem and the following sample code will make TF froze with a 220% (2CPU) CPU usage and no output at all. Any idea why this happen? Thanks!

import tensorflow as tf
from tensorflow.contrib.layers.python.layers.optimizers import optimize_loss
from tensorflow.contrib.learn.python.learn.estimators import model_fn
from tensorflow.contrib.learn.python.learn.estimators.estimator import Estimator
from tensorflow.python.framework import ops


def main(_):
    def func(features, targets, mode, params):
        idx = tf.concat([features['a'], features['b']], axis=1)

        embedding = tf.get_variable("embed", [10, 20], dtype=tf.float32)

        pred = tf.reduce_sum(tf.nn.embedding_lookup(embedding, idx))

        train_op = optimize_loss(loss=pred,
                                 global_step=tf.train.get_global_step(),
                                 learning_rate=0.001,
                                 optimizer='Adam',
                                 variables=tf.trainable_variables(),
                                 name="training_loss_optimizer")

        eval_metric_dict = dict()
        eval_metric_dict['metric'] = pred

        return model_fn.ModelFnOps(mode=mode,
                                   predictions=pred,
                                   loss=pred,
                                   train_op=train_op,
                                   eval_metric_ops=eval_metric_dict)

    model = Estimator(func, params={})

    model.fit(
        input_fn=lambda: (
            {'a': ops.convert_to_tensor([[1, 2, 3, 4, 5]]), 'b': ops.convert_to_tensor([[2, 3, 4, 3, 5]])},
            None), steps=1)
    model.evaluate(
        input_fn=lambda: (
            {'a': ops.convert_to_tensor([[1, 2, 3, 4, 5]]), 'b': ops.convert_to_tensor([[2, 3, 4, 3, 5]])},
            None))


if __name__ == "__main__":
    tf.app.run()

推荐答案

默认情况下 Estimator.evaluate 假定基于队列的输入,并将继续评估,直到输入管道耗尽.当没有基于队列的输入时,这意味着它将永远循环.修复很简单:只需为 evaluate 提供一个 steps 参数.

By default Estimator.evaluate assumes queue-based input, and will continue evaluating until the input pipeline is exhausted. When there is no queue-based input, this means it will loop forever. The fix is easy: simply provide a steps argument to evaluate.

这篇关于TensorFlow 的 Estimator 因 CPU 使用率低而冻结的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆