Tensorflow梯度始终为零 [英] Tensorflow gradient is always zero

查看:272
本文介绍了Tensorflow梯度始终为零的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我编写了一个小的Tensorflow程序,该程序通过相同的卷积内核连续 num_unrollings 次对图像补丁进行卷积,然后尝试最小化两者之间的均方差结果值和目标输出。

I have written a small Tensorflow program which convolves an image patch by the same convolution kernel num_unrollings times in a row, and then attempts to minimize the mean squared difference between the resulting values and a target output.

但是,当我运行 num_unrollings 大于1的模型时,我损失的梯度(<$相对于卷积内核( tf_kernel )的c $ c> tf_loss )项为零,因此不会发生学习。

However, when I run the model with num_unrollings greater than 1, the gradient of my my loss (tf_loss) term with respect to the convolution kernel (tf_kernel) is zero, so no learning occurs.

这是我能想到的最小的代码(python 3),对长度很抱歉:

Here is the smallest code (python 3) I can come up with which reproduces the problem, sorry about the length:

import tensorflow as tf
import numpy as np

batch_size = 1
kernel_size = 3
num_unrollings = 2

input_image_size = (kernel_size//2 * num_unrollings)*2 + 1

graph = tf.Graph()

with graph.as_default():
    # Input data
    tf_input_images = tf.random_normal(
        [batch_size, input_image_size, input_image_size, 1]
    )

    tf_outputs = tf.random_normal(
        [batch_size]
    )

    # Convolution kernel
    tf_kernel = tf.Variable(
        tf.zeros([kernel_size, kernel_size, 1, 1])
    )

    # Perform convolution(s)
    _convolved_input = tf_input_images
    for _ in range(num_unrollings):
        _convolved_input = tf.nn.conv2d(
            _convolved_input, 
            tf_kernel, 
            [1, 1, 1, 1], 
            padding="VALID"
        )

    tf_prediction = tf.reshape(_convolved_input, shape=[batch_size])

    tf_loss = tf.reduce_mean(
        tf.squared_difference(
            tf_prediction,
            tf_outputs
        )
    )

    # FIXME: why is this gradient zero when num_unrollings > 1??
    tf_gradient = tf.concat(0, tf.gradients(tf_loss, tf_kernel))

# Calculate and report gradient
with tf.Session(graph=graph) as session:

    tf.initialize_all_variables().run()

    gradient = session.run(tf_gradient)

    print(gradient.reshape(kernel_size**2))
    #prints [ 0.  0.  0.  0.  0.  0.  0.  0.  0.]

谢谢您的帮助!

推荐答案

尝试替换

# Convolution kernel
tf_kernel = tf.Variable(
    tf.zeros([kernel_size, kernel_size, 1, 1])
)

,例如:

# Convolution kernel
tf_kernel = tf.Variable(
    tf.random_normal([kernel_size, kernel_size, 1, 1])
)

这篇关于Tensorflow梯度始终为零的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆