Tensorflow-没有为任何变量提供梯度 [英] Tensorflow - No gradients provided for any variable

查看:93
本文介绍了Tensorflow-没有为任何变量提供梯度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在Jupyter上尝试一些代码,并一直卡在这里.如果我删除以"optimizer = ..."开头的行以及对该行的所有引用,则一切正常.但是,如果我将此行放入代码中,则会出现错误.

I am experimenting some code on Jupyter and keep getting stuck here. Things work actually fine if I remove the line starting with "optimizer = ..." and all references to this line. But if I put this line in the code, it gives an error.

我没有在此处粘贴所有其他功能,以使代码的大小保持可读性.我希望有经验的人能够立即看到这里的问题.

I am not pasting all other functions here to keep the size of the code at a readable level. I hope someone more experienced can see it at once what is the problem here.

请注意,输入层,2个隐藏层和输出层中有5、4、3和2个单位.

Note that there are 5, 4, 3, and 2 units in input layer, in 2 hidden layers, and in output layers.

代码:

tf.reset_default_graph()

num_units_in_layers = [5,4,3,2]

X = tf.placeholder(shape=[5, 3], dtype=tf.float32)
Y = tf.placeholder(shape=[2, 3], dtype=tf.float32)
parameters = initialize_layer_parameters(num_units_in_layers)
init = tf.global_variables_initializer() 

my_sess = tf.Session()
my_sess.run(init)
ZL = forward_propagation_with_relu(X, num_units_in_layers, parameters, my_sess)
#my_sess.run(parameters)  # Do I need to run this? Or is it obsolete?

cost = compute_cost(ZL, Y, my_sess, parameters, batch_size=3, lambd=0.05)
optimizer =  tf.train.AdamOptimizer(learning_rate = 0.001).minimize(cost)
_ , minibatch_cost = my_sess.run([optimizer, cost], 
                                 feed_dict={X: minibatch_X, 
                                            Y: minibatch_Y})

print(minibatch_cost)
my_sess.close()

错误:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-321-135b9fc18268> in <module>()
     16 cost = compute_cost(ZL, Y, my_sess, parameters, 3, 0.05)
     17 
---> 18 optimizer =  tf.train.AdamOptimizer(learning_rate = 0.001).minimize(cost)
     19 _ , minibatch_cost = my_sess.run([optimizer, cost], 
     20                                  feed_dict={X: minibatch_X, 

~/.local/lib/python3.5/site-packages/tensorflow/python/training/optimizer.py in minimize(self, loss, global_step, var_list, gate_gradients, aggregation_method, colocate_gradients_with_ops, name, grad_loss)
    362           "No gradients provided for any variable, check your graph for ops"
    363           " that do not support gradients, between variables %s and loss %s." %
--> 364           ([str(v) for _, v in grads_and_vars], loss))
    365 
    366     return self.apply_gradients(grads_and_vars, global_step=global_step,

ValueError: No gradients provided for any variable, check your graph for ops that do not support gradients, between variables ["<tf.Variable 'weights/W1:0' shape=(4, 5) dtype=float32_ref>", "<tf.Variable 'biases/b1:0' shape=(4, 1) dtype=float32_ref>", "<tf.Variable 'weights/W2:0' shape=(3, 4) dtype=float32_ref>", "<tf.Variable 'biases/b2:0' shape=(3, 1) dtype=float32_ref>", "<tf.Variable 'weights/W3:0' shape=(2, 3) dtype=float32_ref>", "<tf.Variable 'biases/b3:0' shape=(2, 1) dtype=float32_ref>"] and loss Tensor("Add_3:0", shape=(), dtype=float32).

请注意,如果我跑步

print(tf.trainable_variables())

就在"optimizer = ..."行之前,我实际上在那里看到了我的可训练变量.

just before the "optimizer = ..." line, I actually see my trainable variables there.

hts/W1:0' shape=(4, 5) dtype=float32_ref>, <tf.Variable 'biases/b1:0' shape=(4, 1) dtype=float32_ref>, <tf.Variable 'weights/W2:0' shape=(3, 4) dtype=float32_ref>, <tf.Variable 'biases/b2:0' shape=(3, 1) dtype=float32_ref>, <tf.Variable 'weights/W3:0' shape=(2, 3) dtype=float32_ref>, <tf.Variable 'biases/b3:0' shape=(2, 1) dtype=float32_ref>]

会有人知道可能是什么问题吗?

Would anyone have an idea about what can be the problem?

编辑和添加更多信息: 如果您想了解我如何创建&初始化我的参数,这是代码.也许这部分有问题,但我看不到什么.

EDITING and ADDING SOME MORE INFO: In case you would like to see how I create & initialize my parameters, here is the code. Maybe there is sth wrong with this part but I don't see what..

def get_nn_parameter(variable_scope, variable_name, dim1, dim2):
  with tf.variable_scope(variable_scope, reuse=tf.AUTO_REUSE):
    v = tf.get_variable(variable_name, 
                        [dim1, dim2], 
                        trainable=True, 
                        initializer = tf.contrib.layers.xavier_initializer())
  return v


def initialize_layer_parameters(num_units_in_layers):
    parameters = {}
    L = len(num_units_in_layers)

    for i in range (1, L):
        temp_weight = get_nn_parameter("weights",
                                       "W"+str(i), 
                                       num_units_in_layers[i], 
                                       num_units_in_layers[i-1])
        parameters.update({"W" + str(i) : temp_weight})  
        temp_bias = get_nn_parameter("biases",
                                     "b"+str(i), 
                                     num_units_in_layers[i], 
                                     1)
        parameters.update({"b" + str(i) : temp_bias})  

    return parameters

#

附录

我知道了.我没有在这里写一个单独的答案,而是在这里添加了正确的代码版本.

I got it working. Instead of writing a separate answer, I am adding the correct version of my code here.

(下面大卫的回答很有帮助.)

(David's answer below helped a lot.)

我只是将my_sess删除为我的compute_cost函数的参数. (我以前无法使其工作,但似乎根本不需要.)而且我还在主函数中对语句进行了重新排序,以正确的顺序调用事物.

I simply removed the my_sess as parameter to my compute_cost function. (I could not make it work previously but seemingly it is not needed at all.) And I also reordered statements in my main function to call things in the right order.

以下是我的费用函数的工作版本以及如何调用它:

Here is the working version of my cost function and how I call it:

def compute_cost(ZL, Y, parameters, mb_size, lambd):

    logits = tf.transpose(ZL)
    labels = tf.transpose(Y)

    cost_unregularized = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits = logits, labels = labels))

    #Since the dict parameters includes both W and b, it needs to be divided with 2 to find L
    L = len(parameters) // 2

    list_sum_weights = []

    for i in range (0, L):
        list_sum_weights.append(tf.nn.l2_loss(parameters.get("W"+str(i+1))))

    regularization_effect = tf.multiply((lambd / mb_size), tf.add_n(list_sum_weights))
    cost = tf.add(cost_unregularized, regularization_effect)

    return cost

这是主要函数,我将之称为compute_cost(..)函数:

And here is the main function where I call the compute_cost(..) function:

tf.reset_default_graph()

num_units_in_layers = [5,4,3,2]

X = tf.placeholder(shape=[5, 3], dtype=tf.float32)
Y = tf.placeholder(shape=[2, 3], dtype=tf.float32)
parameters = initialize_layer_parameters(num_units_in_layers)

my_sess = tf.Session()
ZL = forward_propagation_with_relu(X, num_units_in_layers, parameters)

cost = compute_cost(ZL, Y, parameters, 3, 0.05)
optimizer =  tf.train.AdamOptimizer(learning_rate = 0.001).minimize(cost)
init = tf.global_variables_initializer() 

my_sess.run(init)
_ , minibatch_cost = my_sess.run([optimizer, cost], 
                                 feed_dict={X: [[-1.,4.,-7.],[2.,6.,2.],[3.,3.,9.],[8.,4.,4.],[5.,3.,5.]], 
                                            Y: [[0.6, 0., 0.3], [0.4, 0., 0.7]]})


print(minibatch_cost)

my_sess.close()

推荐答案

我确定99.9%的人错误地创建了成本函数.

I'm 99.9% sure you're creating your cost function incorrectly.

cost = compute_cost(ZL, Y, my_sess, parameters, batch_size=3, lambd=0.05)

您的成本函数应该是张量.您正在将会话传递给cost函数,该函数看起来实际上是在尝试运行严重错误的tensorflow会话.

Your cost function should be a tensor. You are passing your session into the cost function, which looks like it's actually trying to run tensorflow session which is grossly in error.

然后,您将compute_cost的结果传递给最小化器.

Then later you're passing the result of compute_cost to your minimizer.

这是关于张量流的常见误解.

This is a common misunderstanding about tensorflow.

Tensorflow是一种声明式编程范例,这意味着您首先声明要运行的所有操作,然后再传递数据并运行它.

Tensorflow is a declarative programming paradigm, that means that you first declare all the operations you want to run, then later you pass data in and run it.

重构您的代码以严格遵循以下最佳做法:

Refactor your code to strictly follow this best practice:

(1)创建一个build_graph()函数,应在此函数中放置所有的数学运算.您应该定义成本函数和网络的所有层.返回optimize.minimize()训练操作(以及您可能希望获得的其他任何操作,例如准确性).

(1) Create a build_graph() function, in this function all of your math operations should be placed. You should define your cost function and all layers of the network. Return the optimize.minimize() training op (and any other OPs you might want to get back such as accuracy).

(2)现在创建一个会话.

(2) Now create a session.

(3)在这一点之后,不要再创建任何张量流操作或变量,如果您觉得需要做错什么.

(3) After this point do not create any more tensorflow operations or variables, if you feel like you need to you're doing something wrong.

(4)在train_op上调用sess.run,然后通过feed_dict传递占位符数据.

(4) Call sess.run on your train_op, and pass in the placeholder data via feed_dict.

这是一个简单的代码结构示例:

Here's a simple example of how to structure your code:

https://github.com/aymericdamien /TensorFlow-Examples/blob/master/notebooks/3_NeuralNetworks/neural_network_raw.ipynb

总的来说,aymericdamien提出了很多很好的例子,我强烈建议您复习它们以了解张量流的基本知识.

In general there are tremendously good examples put up by aymericdamien, I strongly recommend reviewing them to learn the basics of tensorflow.

这篇关于Tensorflow-没有为任何变量提供梯度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆