稍微不同的形状会收敛到错误的数字 - 为什么? [英] Slightly different shape converges to wrong number - why?

查看：25 发布时间：2021/9/5 19:46:37 tensorflow

本文介绍了稍微不同的形状会收敛到错误的数字 - 为什么?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想弄清楚为什么 TensorFlow 会做一些令人惊讶的事情.我把它归结为一个测试用例，尝试对一个简单的问题进行线性回归，这个问题只是将两个输入加在一起.权重收敛到 1.0，偏差应该收敛到 0.0.

I'm trying to figure out why TensorFlow is doing something surprising. I've boiled it down to a test case, attempting linear regression on a trivial problem that just adds two inputs together. The weights converge to 1.0 and the bias to 0.0 as they should.

使用此版本的训练输出:

With this version of the training outputs:

train_y = [2., 3., 4.]

成本应该收敛到 0.0，但在这个版本中:

the cost converges to 0.0 as it should, but with this version:

train_y = [[2.], [3.], [4.]]

成本收敛到 4.0.如果第二个版本给出错误消息，我不会感到惊讶；令人惊讶的是，它默默地给出了错误的答案.为什么要这样做?

the cost converges to 4.0. I wouldn't be so surprised if the second version gave an error message; what's surprising is that it silently gives a wrong answer. Why is it doing this?

测试用例的完整代码:

import tensorflow as tf
sess = tf.InteractiveSession()
tf.set_random_seed(1)

# Parameters
epochs = 10000
learning_rate = 0.01

# Data
train_x = [[1., 1.], [1., 2.], [2., 2.]]

# It works with this version
train_y = [2., 3., 4.]

# But converges on cost 4.0 with this version
#train_y = [[2.], [3.], [4.]]

# Number of samples
n_samples = len(train_x)

# Inputs and outputs
x = tf.placeholder(tf.float32, name='x')
y = tf.placeholder(tf.float32, name='y')

# Weights
w = tf.Variable(tf.random_normal([2]), name='weight')
b = tf.Variable(tf.random_normal([]), name='bias')

# Model
pred = tf.tensordot(x, w, 1) + b
cost = tf.reduce_sum((pred-y)**2 / n_samples)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

# Train
tf.global_variables_initializer().run()
for epoch in range(epochs):
    # Print update at successive doublings of time
    if epoch&(epoch-1)==0 or epoch==epochs-1:
        print('{:6}'.format(epoch), end=' ')
        print('{:12.6f}'.format(cost.eval({x: train_x, y: train_y})), end=' ')
        print('    ['+', '.join('{:8.6f}'.format(z) for z in w.eval())+']', end=' ')
        print('{:12.6f}'.format(b.eval()))
    for (x1, y1) in zip(train_x, train_y):
        optimizer.run({x: x1, y: y1})

为什么?

问题是当您提供不同形状的张量时的成本函数计算.更具体地说，它是 pred - y 计算.

为了在避免混乱的同时向您展示这个特定示例中出了什么问题，我将使用具有您上面提到的相同形状和值的常量:

To show you what went wrong in this specific example while avoiding the clutter, I will use constants with the same shapes and values you mentioned above:

y0 = tf.constant([2., 3., 4.])
y1 = tf.constant([[2.], [3.], [4.]])
pred = tf.constant([2., 3., 4.])

现在，让我们看看表达式 pred - y0 和 pred - y1 的形状:

Now, let's see the shapes of the expressions pred - y0 and pred - y1:

res0 = pred - y0
res1 = pred - y1

print(res0.shape)
print(res1.shape)

输出为:

(3,)
(3, 3)

(3, 3) 表明在计算 (3,) 和 (3) 形状的 pred - y1 时, 1)，我们有一个广播来塑造(3, 3).这也意味着 tf.reduce_sum() 调用汇总了 3x3 = 9 个元素，而不是仅 3 个.

The (3, 3) is showing that when calculating pred - y1 of shapes (3,) and (3, 1), we had a broadcasting to shape (3, 3). This also means that the tf.reduce_sum() call summed 3x3 = 9 elements rather than only 3.

您可以通过使用 tf.transpose() 将 y1 转置为 (1, 3) 来解决此问题:

You can solve this for this case by transposing y1 to (1, 3) using tf.transpose():

res1_fixed = pred - tf.transpose(y1)
print(res1_fixed.shape)

现在的输出是:

(1, 3)

如何解决:

现在，回到您的代码...只需更改以下表达式:

How to fix:

Now, back to your code... simply change the following expression:

cost = tf.reduce_sum((pred-y)**2 / n_samples)

致:

cost = tf.reduce_sum((pred-tf.transpose(y))**2 / n_samples)

并且在这两种情况下，您都将按照预期使收敛为零.

And you will get the convergence to zero as expected in both cases.

这篇关于稍微不同的形状会收敛到错误的数字 - 为什么?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

稍微不同的形状会收敛到错误的数字 - 为什么? [英] Slightly different shape converges to wrong number - why?

问题描述

推荐答案

为什么?

如何解决:

How to fix:

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

稍微不同的形状会收敛到错误的数字 - 为什么? [英] Slightly different shape converges to wrong number - why?

问题描述

推荐答案

为什么?

如何解决:

How to fix:

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭