关于矩阵的Tensorflow梯度 [英] Tensorflow gradient with respect to matrix

查看:264
本文介绍了关于矩阵的Tensorflow梯度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

仅出于上下文考虑,我正在尝试使用Tensorflow实现梯度下降算法.

Just for context, I'm trying to implement a gradient descent algorithm with Tensorflow.

我有一个矩阵X

[ x1 x2 x3 x4 ]
[ x5 x6 x7 x8 ]

我将其乘以某些特征向量Y以获得Z

which I multiply by some feature vector Y to get Z

      [ y1 ]
Z = X [ y2 ]  = [ z1 ]
      [ y3 ]    [ z2 ]
      [ y4 ]

然后我将Z通过softmax函数放置,并获取日志.我将输出矩阵称为W.

I then put Z through a softmax function, and take the log. I'll refer to the output matrix as W.

所有这些实现如下(添加了一些样板文件以便可以运行)

All this is implemented as follows (little bit of boilerplate added so it's runnable)

sess = tf.Session()
num_features = 4
num_actions = 2

policy_matrix = tf.get_variable("params", (num_actions, num_features))
state_ph = tf.placeholder("float", (num_features, 1))
action_linear = tf.matmul(params, state_ph)
action_probs = tf.nn.softmax(action_linear, axis=0)
action_problogs = tf.log(action_probs)

W(与action_problogs对应)看起来像

W (corresponding to action_problogs) looks like

[ w1 ]
[ w2 ]

我想找到相对于矩阵Xw1梯度-也就是说,我想计算

I'd like to find the gradient of w1 with respect to the matrix X- that is, I'd like to calculate

          [ d/dx1 w1 ]
d/dX w1 =      .
               .
          [ d/dx8 w1 ]

(最好仍然看起来像矩阵,所以我可以将其添加到X,但我真的不在乎)

(preferably still looking like a matrix so I can add it to X, but I'm really not concerned about that)

我希望tf.gradients能解决问题.我这样计算梯度"

I was hoping that tf.gradients would do the trick. I calculated the "gradient" like so

problog_gradient = tf.gradients(action_problogs, policy_matrix)

但是,当我检查problog_gradient时,这就是我得到的

However, when I inspect problog_gradient, here's what I get

[<tf.Tensor 'foo_4/gradients/foo_4/MatMul_grad/MatMul:0' shape=(2, 4) dtype=float32>]

请注意,它的形状与X完全相同,但实际上不应该如此.我希望得到两个渐变的列表,每个渐变涉及8个元素.我怀疑我得到了两个渐变,但每个渐变都涉及四个元素.

Note that this has exactly the same shape as X, but that it really shouldn't. I was hoping to get a list of two gradients, each with respect to 8 elements. I suspect that I'm instead getting two gradients, but each with respect to four elements.

我对tensorflow还是很陌生,所以我很感激并解释正在发生的事情以及如何实现所需的行为.

I'm very new to tensorflow, so I'd appreciate and explanation of what's going on and how I might achieve the behavior I desire.

推荐答案

渐变需要一个标量函数,因此默认情况下,它会汇总条目.这是默认行为,仅是因为所有梯度下降算法都需要该类型的功能,并且随机梯度下降(或其变化)是Tensorflow内部的首选方法.您不会找到任何更高级的算法(例如BFGS之类的),因为它们尚未完全实现(它们将需要一个真正的Jacobian,也尚未实现).对于它的价值,这是我编写的可运行的Jacobian实现:

The gradient expects a scalar function, so by default, it sums up the entries. That is the default behavior simply because all of the gradient descent algorithms need that type of functionality, and stochastic gradient descent (or variations thereof) are the preferred methods inside Tensorflow. You won't find any of the more advanced algorithms (like BFGS or something) because they simply haven't been implemented yet (and they would require a true Jacobian, which also hasn't been implemented). For what its worth, here is a functioning Jacobian implementation that I wrote:

def map(f, x, dtype=None, parallel_iterations=10):
    '''
    Apply f to each of the elements in x using the specified number of parallel iterations.

    Important points:
    1. By "elements in x", we mean that we will be applying f to x[0],...x[tf.shape(x)[0]-1].
    2. The output size of f(x[i]) can be arbitrary. However, if the dtype of that output
       is different than the dtype of x, then you need to specify that as an additional argument.
    '''
    if dtype is None:
        dtype = x.dtype

    n = tf.shape(x)[0]
    loop_vars = [
        tf.constant(0, n.dtype),
        tf.TensorArray(dtype, size=n),
    ]
    _, fx = tf.while_loop(
        lambda j, _: j < n,
        lambda j, result: (j + 1, result.write(j, f(x[j]))),
        loop_vars,
        parallel_iterations=parallel_iterations
    )
    return fx.stack()

def jacobian(fx, x, parallel_iterations=10):
    '''
    Given a tensor fx, which is a function of x, vectorize fx (via tf.reshape(fx, [-1])),
    and then compute the jacobian of each entry of fx with respect to x.
    Specifically, if x has shape (m,n,...,p), and fx has L entries (tf.size(fx)=L), then
    the output will be (L,m,n,...,p), where output[i] will be (m,n,...,p), with each entry denoting the
    gradient of output[i] wrt the corresponding element of x.
    '''
    return map(lambda fxi: tf.gradients(fxi, x)[0],
               tf.reshape(fx, [-1]),
               dtype=x.dtype,
               parallel_iterations=parallel_iterations)

虽然此实现有效,但是当您尝试嵌套它时却不起作用. 例如,如果尝试使用jacobian( jacobian( ... ))计算Hessian,则会出现一些奇怪的错误.这被跟踪为问题675 .我仍然等待响应,为什么会引发错误.我相信while循环实现或渐变实现中都有一个深层次的错误,但是我真的不知道.

While this implementation works, it does not work when you try to nest it. For instance, if you try to compute the Hessian by using jacobian( jacobian( ... )), then you get some strange errors. This is being tracked as Issue 675. I am still awaiting a response on why this throws an error. I believe that there is a deep-seated bug in either the while loop implementation or the gradient implementation, but I really have no idea.

无论如何,如果您只需要一个jacobian,请尝试上面的代码.

Anyway, if you just need a jacobian, try the code above.

这篇关于关于矩阵的Tensorflow梯度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆