Tensorflow:计算每个样本的 Hessian [英] Tensorflow: compute Hessian with respect to each sample

查看:24
本文介绍了Tensorflow:计算每个样本的 Hessian的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 X 大小的张量 M x D.我们可以将 X 的每一行解释为一个训练样本,将每一列解释为一个特征.

I have a tensor X of size M x D. We can interpret each row of X as a training sample and each column as a feature.

X 用于计算大小为 M x 1 的张量 u(换句话说,u 依赖于计算图中的 X).我们可以将其解释为预测向量;每个样品一个.特别地,u的第m行仅使用X的第m行计算.

X is used to compute a tensor u of size M x 1 (in other words, u depends on X in the computational graph). We can interpret this as a vector of predictions; one for each sample. In particular, the m-th row of u is computed using only the m-th row of X.

现在,如果我运行 tensor.gradients(u, X)[0],我会得到一个 M x D 张量对应于u 相对于 X 的每个样本"梯度.

Now, if I run tensor.gradients(u, X)[0], I obtain an M x D tensor corresponding to the "per-sample" gradient of u with respect to X.

我怎样才能类似地计算每个样本"的 Hessian 张量?(即,M x D x D 数量)

How can I similarly compute the "per-sample" Hessian tensor? (i.e., an M x D x D quantity)

附录:下面彼得的回答是正确的.我还发现了另一种使用堆叠和拆垛的方法(使用 Peter 的表示法):

Addendum: Peter's answer below is correct. I also found a different approach using stacking and unstacking (using Peter's notation):

hess2 = tf.stack([
    tf.gradients( tmp, a )[ 0 ]
    for tmp in tf.unstack( grad, num=5, axis=1 )
], axis = 2)

在 Peter 的示例中,D=5 是特征的数量.我怀疑(但我没有检查过)上述对于 M 大的更快,因为它跳过了彼得回答中提到的零条目.

In Peter's example, D=5 is the number of features. I suspect (but I have not checked) that The above is faster for M large, as it skips over the zero entries mentioned in Peter's answer.

推荐答案

tf.hessians() 正在计算提供的 ysxs 的 Hessian> 无论尺寸如何.由于您有维度 M x Dxs 维度的结果 M xD 因此结果将是维度 M x D x M x D.但是由于每个样本的输出相互独立,大多数 Hessian 将为零,即第三维中只有一个切片具有任何值.因此,要获得所需的结果,您应该在两个 M 维度中取对角线,或者更简单,您应该简单地求和并消除第三个维度,如下所示:

tf.hessians() is calculating the Hessian for the provided ys and xs reagardless of the dimensions. Since you have a result of dimension M x D and xs of dimension M x D therefore the result will be of dimension M x D x M x D. But since the outputs per exemplar are independent of each other, most of the Hessian will be zero, namely only one slice in the third dimension will have any value whatsoever. Therefore to get to your desired result, you should take the diagonal in the two M dimensions, or much easier, you should simply sum and eliminate the third dimension like so:

hess2 = tf.reduce_sum( hess, axis = 2 )

示例代码(已测试):

import tensorflow as tf

a = tf.constant( [ [ 1.0, 1, 1, 1, 1 ], [ 2, 2, 2, 2, 2 ], [ 3, 3, 3, 3, 3 ] ] )
b = tf.constant( [ [ 1.0 ], [ 2 ], [ 3 ], [ 4 ], [ 5 ] ] )
c = tf.matmul( a, b )
c_sq = tf.square( c )

grad = tf.gradients( c_sq, a )[ 0 ]

hess = tf.hessians( c_sq, a )[ 0 ]
hess2 = tf.reduce_sum( hess, axis = 2 )


with tf.Session() as sess:
    res = sess.run( [ c_sq, grad, hess2 ] )

    for v in res:
        print( v.shape )
        print( v )
        print( "=======================")

将输出:

(3, 1)
[[225.]
[ 900.]
[2025.]]
========================
(3, 5)
[[ 30. 60. 90. 120. 150.]
[ 60. 120. 180. 240. 300. ]
[ 90. 180. 270. 360. 450. ]]
========================
(3, 5, 5)
[[[ 2. 4. 6. 8. 10.]
[ 4. 8. 12. 16. 20. ]
[ 6. 12. 18. 24. 30. ]
[ 8. 16. 24. 32. 40. ]
[10.20. 30. 40. 50.]]

(3, 1)
[[ 225.]
[ 900.]
[2025.]]
=======================
(3, 5)
[[ 30. 60. 90. 120. 150.]
[ 60. 120. 180. 240. 300.]
[ 90. 180. 270. 360. 450.]]
=======================
(3, 5, 5)
[[[ 2. 4. 6. 8. 10.]
[ 4. 8. 12. 16. 20.]
[ 6. 12. 18. 24. 30.]
[ 8. 16. 24. 32. 40.]
[10. 20. 30. 40. 50.]]

[[ 2. 4. 6. 8. 10.]
[ 4. 8. 12. 16. 20. ]
[ 6. 12. 18. 24. 30. ]
[ 8. 16. 24. 32. 40. ]
[10.20. 30. 40. 50.]]

[[ 2. 4. 6. 8. 10.]
[ 4. 8. 12. 16. 20.]
[ 6. 12. 18. 24. 30.]
[ 8. 16. 24. 32. 40.]
[10. 20. 30. 40. 50.]]

[[ 2. 4. 6. 8. 10.]
[ 4. 8. 12. 16. 20. ]
[ 6. 12. 18. 24. 30. ]
[ 8. 16. 24. 32. 40. ]
[10.20. 30. 40. 50.]]]
========================

[[ 2. 4. 6. 8. 10.]
[ 4. 8. 12. 16. 20.]
[ 6. 12. 18. 24. 30.]
[ 8. 16. 24. 32. 40.]
[10. 20. 30. 40. 50.]]]
=======================

这篇关于Tensorflow:计算每个样本的 Hessian的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆