Tensorflow - 没有为任何变量提供梯度 [英] Tensorflow - No gradients provided for any variable

查看:18
本文介绍了Tensorflow - 没有为任何变量提供梯度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在 Jupyter 上试验一些代码,但一直卡在这里.如果我删除以optimizer = ..."开头的行以及对此行的所有引用,事情实际上会很好.但是如果我把这一行放在代码中,它就会出错.

I am experimenting some code on Jupyter and keep getting stuck here. Things work actually fine if I remove the line starting with "optimizer = ..." and all references to this line. But if I put this line in the code, it gives an error.

我不会在此处粘贴所有其他函数,以将代码的大小保持在可读级别.我希望有经验的人能马上看到这里的问题是什么.

I am not pasting all other functions here to keep the size of the code at a readable level. I hope someone more experienced can see it at once what is the problem here.

请注意,输入层、2 个隐藏层和输出层有 5、4、3 和 2 个单元.

Note that there are 5, 4, 3, and 2 units in input layer, in 2 hidden layers, and in output layers.

代码:

tf.reset_default_graph()

num_units_in_layers = [5,4,3,2]

X = tf.placeholder(shape=[5, 3], dtype=tf.float32)
Y = tf.placeholder(shape=[2, 3], dtype=tf.float32)
parameters = initialize_layer_parameters(num_units_in_layers)
init = tf.global_variables_initializer() 

my_sess = tf.Session()
my_sess.run(init)
ZL = forward_propagation_with_relu(X, num_units_in_layers, parameters, my_sess)
#my_sess.run(parameters)  # Do I need to run this? Or is it obsolete?

cost = compute_cost(ZL, Y, my_sess, parameters, batch_size=3, lambd=0.05)
optimizer =  tf.train.AdamOptimizer(learning_rate = 0.001).minimize(cost)
_ , minibatch_cost = my_sess.run([optimizer, cost], 
                                 feed_dict={X: minibatch_X, 
                                            Y: minibatch_Y})

print(minibatch_cost)
my_sess.close()

错误:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-321-135b9fc18268> in <module>()
     16 cost = compute_cost(ZL, Y, my_sess, parameters, 3, 0.05)
     17 
---> 18 optimizer =  tf.train.AdamOptimizer(learning_rate = 0.001).minimize(cost)
     19 _ , minibatch_cost = my_sess.run([optimizer, cost], 
     20                                  feed_dict={X: minibatch_X, 

~/.local/lib/python3.5/site-packages/tensorflow/python/training/optimizer.py in minimize(self, loss, global_step, var_list, gate_gradients, aggregation_method, colocate_gradients_with_ops, name, grad_loss)
    362           "No gradients provided for any variable, check your graph for ops"
    363           " that do not support gradients, between variables %s and loss %s." %
--> 364           ([str(v) for _, v in grads_and_vars], loss))
    365 
    366     return self.apply_gradients(grads_and_vars, global_step=global_step,

ValueError: No gradients provided for any variable, check your graph for ops that do not support gradients, between variables ["<tf.Variable 'weights/W1:0' shape=(4, 5) dtype=float32_ref>", "<tf.Variable 'biases/b1:0' shape=(4, 1) dtype=float32_ref>", "<tf.Variable 'weights/W2:0' shape=(3, 4) dtype=float32_ref>", "<tf.Variable 'biases/b2:0' shape=(3, 1) dtype=float32_ref>", "<tf.Variable 'weights/W3:0' shape=(2, 3) dtype=float32_ref>", "<tf.Variable 'biases/b3:0' shape=(2, 1) dtype=float32_ref>"] and loss Tensor("Add_3:0", shape=(), dtype=float32).

请注意,如果我运行

print(tf.trainable_variables())

就在optimizer = ..."行之前,我实际上在那里看到了我的可训练变量.

just before the "optimizer = ..." line, I actually see my trainable variables there.

hts/W1:0' shape=(4, 5) dtype=float32_ref>, <tf.Variable 'biases/b1:0' shape=(4, 1) dtype=float32_ref>, <tf.Variable 'weights/W2:0' shape=(3, 4) dtype=float32_ref>, <tf.Variable 'biases/b2:0' shape=(3, 1) dtype=float32_ref>, <tf.Variable 'weights/W3:0' shape=(2, 3) dtype=float32_ref>, <tf.Variable 'biases/b3:0' shape=(2, 1) dtype=float32_ref>]

有人知道可能是什么问题吗?

Would anyone have an idea about what can be the problem?

编辑和添加更多信息:如果你想看看我是如何创建 &初始化我的参数,这里是代码.也许这部分有什么问题,但我不明白是什么..

EDITING and ADDING SOME MORE INFO: In case you would like to see how I create & initialize my parameters, here is the code. Maybe there is sth wrong with this part but I don't see what..

def get_nn_parameter(variable_scope, variable_name, dim1, dim2):
  with tf.variable_scope(variable_scope, reuse=tf.AUTO_REUSE):
    v = tf.get_variable(variable_name, 
                        [dim1, dim2], 
                        trainable=True, 
                        initializer = tf.contrib.layers.xavier_initializer())
  return v


def initialize_layer_parameters(num_units_in_layers):
    parameters = {}
    L = len(num_units_in_layers)

    for i in range (1, L):
        temp_weight = get_nn_parameter("weights",
                                       "W"+str(i), 
                                       num_units_in_layers[i], 
                                       num_units_in_layers[i-1])
        parameters.update({"W" + str(i) : temp_weight})  
        temp_bias = get_nn_parameter("biases",
                                     "b"+str(i), 
                                     num_units_in_layers[i], 
                                     1)
        parameters.update({"b" + str(i) : temp_bias})  

    return parameters

#

附录

我成功了.我没有编写单独的答案,而是在此处添加正确版本的代码.

I got it working. Instead of writing a separate answer, I am adding the correct version of my code here.

(下面大卫的回答很有帮助.)

(David's answer below helped a lot.)

我只是删除了 my_sess 作为我的计算成本函数的参数.(我以前无法让它工作,但似乎根本不需要它.)而且我还在我的主函数中重新排序语句以按正确的顺序调用事物.

I simply removed the my_sess as parameter to my compute_cost function. (I could not make it work previously but seemingly it is not needed at all.) And I also reordered statements in my main function to call things in the right order.

这是我的成本函数的工作版本以及我如何调用它:

Here is the working version of my cost function and how I call it:

def compute_cost(ZL, Y, parameters, mb_size, lambd):

    logits = tf.transpose(ZL)
    labels = tf.transpose(Y)

    cost_unregularized = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits = logits, labels = labels))

    #Since the dict parameters includes both W and b, it needs to be divided with 2 to find L
    L = len(parameters) // 2

    list_sum_weights = []

    for i in range (0, L):
        list_sum_weights.append(tf.nn.l2_loss(parameters.get("W"+str(i+1))))

    regularization_effect = tf.multiply((lambd / mb_size), tf.add_n(list_sum_weights))
    cost = tf.add(cost_unregularized, regularization_effect)

    return cost

这里是我调用 compute_cost(..) 函数的主函数:

And here is the main function where I call the compute_cost(..) function:

tf.reset_default_graph()

num_units_in_layers = [5,4,3,2]

X = tf.placeholder(shape=[5, 3], dtype=tf.float32)
Y = tf.placeholder(shape=[2, 3], dtype=tf.float32)
parameters = initialize_layer_parameters(num_units_in_layers)

my_sess = tf.Session()
ZL = forward_propagation_with_relu(X, num_units_in_layers, parameters)

cost = compute_cost(ZL, Y, parameters, 3, 0.05)
optimizer =  tf.train.AdamOptimizer(learning_rate = 0.001).minimize(cost)
init = tf.global_variables_initializer() 

my_sess.run(init)
_ , minibatch_cost = my_sess.run([optimizer, cost], 
                                 feed_dict={X: [[-1.,4.,-7.],[2.,6.,2.],[3.,3.,9.],[8.,4.,4.],[5.,3.,5.]], 
                                            Y: [[0.6, 0., 0.3], [0.4, 0., 0.7]]})


print(minibatch_cost)

my_sess.close()

推荐答案

我 99.9% 确定您错误地创建了成本函数.

I'm 99.9% sure you're creating your cost function incorrectly.

cost = compute_cost(ZL, Y, my_sess, parameters, batch_size=3, lambd=0.05)

你的成本函数应该是一个张量.您正在将会话传递给成本函数,看起来它实际上是在尝试运行严重错误的 tensorflow 会话.

Your cost function should be a tensor. You are passing your session into the cost function, which looks like it's actually trying to run tensorflow session which is grossly in error.

然后你将 compute_cost 的结果传递给你的最小化器.

Then later you're passing the result of compute_cost to your minimizer.

这是对 tensorflow 的常见误解.

This is a common misunderstanding about tensorflow.

Tensorflow 是一种声明式编程范式,这意味着您首先声明要运行的所有操作,然后传入数据并运行它.

Tensorflow is a declarative programming paradigm, that means that you first declare all the operations you want to run, then later you pass data in and run it.

重构您的代码以严格遵循此最佳实践:

Refactor your code to strictly follow this best practice:

(1) 创建一个 build_graph() 函数,你所有的数学运算都应该放在这个函数中.您应该定义成本函数和网络的所有层.返回 optimize.minimize() 训练操作(以及您可能想要返回的任何其他操作,例如准确性).

(1) Create a build_graph() function, in this function all of your math operations should be placed. You should define your cost function and all layers of the network. Return the optimize.minimize() training op (and any other OPs you might want to get back such as accuracy).

(2) 现在创建一个会话.

(2) Now create a session.

(3) 在这之后不要再创建任何张量流操作或变量,如果你觉得你做错了什么.

(3) After this point do not create any more tensorflow operations or variables, if you feel like you need to you're doing something wrong.

(4) 在 train_op 上调用 sess.run,并通过 feed_dict 传入占位符数据.

(4) Call sess.run on your train_op, and pass in the placeholder data via feed_dict.

以下是如何构建代码的简单示例:

Here's a simple example of how to structure your code:

https://github.com/aymericdamien/TensorFlow-Examples/blob/master/notebooks/3_NeuralNetworks/neural_network_raw.ipynb

总的来说,aymericdamien 提供了非常好的示例,我强烈建议您查看它们以了解 tensorflow 的基础知识.

In general there are tremendously good examples put up by aymericdamien, I strongly recommend reviewing them to learn the basics of tensorflow.

这篇关于Tensorflow - 没有为任何变量提供梯度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆