损失函数适用于 reduce_mean 但不适用于 reduce_sum [英] Loss function works with reduce_mean but not reduce_sum

查看:75
本文介绍了损失函数适用于 reduce_mean 但不适用于 reduce_sum的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是张量流的新手,一直在查看此处的示例.我想将多层感知器分类模型重写为回归模型.但是我在修改损失函数时遇到了一些奇怪的行为.它适用于 tf.reduce_mean,但如果我尝试使用 tf.reduce_sum,它会在输出中给出 nan.这看起来很奇怪,因为函数非常相似 - 唯一的区别是平均值将总和结果除以元素数量?所以我看不出这种变化是如何引入 nan 的?

I'm new to tensor flow, and have been looking at the examples here. I wanted to rewrite the multilayer perceptron classification model to be a regression model. However I encountered some strange behaviour when modifying the loss function. It works fine with tf.reduce_mean, but if I try using tf.reduce_sum it gives nan's in the output. This seems very strange, as the functions are very similar - the only difference is that the mean divides the sum result by the number of elements? So I can't see how nan's could be introduced by this change?

import tensorflow as tf

# Parameters
learning_rate = 0.001

# Network Parameters
n_hidden_1 = 32 # 1st layer number of features
n_hidden_2 = 32 # 2nd layer number of features
n_input = 2 # number of inputs
n_output = 1 # number of outputs

# Make artificial data
SAMPLES = 1000
X = np.random.rand(SAMPLES, n_input)
T = np.c_[X[:,0]**2 + np.sin(X[:,1])]

# tf Graph input
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_output])

# Create model
def multilayer_perceptron(x, weights, biases):
    # Hidden layer with tanh activation
    layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
    layer_1 = tf.nn.tanh(layer_1)
    # Hidden layer with tanh activation
    layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
    layer_2 = tf.nn.tanh(layer_2)
    # Output layer with linear activation
    out_layer = tf.matmul(layer_2, weights['out']) + biases['out']
    return out_layer

# Store layers weight & bias
weights = {
    'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
    'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_hidden_2, n_output]))
}
biases = {
    'b1': tf.Variable(tf.random_normal([n_hidden_1])),
    'b2': tf.Variable(tf.random_normal([n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_output]))
}

pred = multilayer_perceptron(x, weights, biases)

# Define loss and optimizer
#se = tf.reduce_sum(tf.square(pred - y))   # Why does this give nans?
mse = tf.reduce_mean(tf.square(pred - y))  # When this doesn't?
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(mse)

# Initializing the variables
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

training_epochs = 10
display_step = 1

# Training cycle
for epoch in range(training_epochs):
    avg_cost = 0.
    # Loop over all batches
    for i in range(100):
        # Run optimization op (backprop) and cost op (to get loss value)
        _, msev = sess.run([optimizer, mse], feed_dict={x: X, y: T})
    # Display logs per epoch step
    if epoch % display_step == 0:
        print("Epoch:", '%04d' % (epoch+1), "mse=", \
            "{:.9f}".format(msev))

有问题的变量 se 被注释掉了.它应该用来代替 mse.

The problematic variable se is commented out. It should be used in place of mse.

使用 mse 输出如下所示:

With mse the output looks like this:

Epoch: 0001 mse= 0.051669389
Epoch: 0002 mse= 0.031438075
Epoch: 0003 mse= 0.026629323
...

se 最终是这样的:

Epoch: 0001 se= nan
Epoch: 0002 se= nan
Epoch: 0003 se= nan
...

推荐答案

跨批次求和的损失是 1000 倍(略读代码我认为你的训练批次大小是 1000)所以你的梯度和参数更新也是大 1000 倍.较大的更新显然会导致 nans.

The loss by summing across the batch is 1000 times larger (from skimming the code I think your training batch size is 1000) so your gradients and parameter updates are also 1000 times larger. The larger updates apparently lead to nans.

通常学习率是按每个示例表示的,因此找到更新梯度的损失也应该按示例进行.如果损失是每批次的,那么学习率需要减少批次大小以获得可比较的训练结果.

Generally learning rates are expressed per example so the loss to find the gradients for updates should be per example also. If the loss is per batch then the learning rate needs to be reduced by the batch size to get comparable training results.

这篇关于损失函数适用于 reduce_mean 但不适用于 reduce_sum的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆