Tensorflow RNN输入大小 [英] Tensorflow RNN input size

查看:239
本文介绍了Tensorflow RNN输入大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用tensorflow创建一个递归神经网络.我的代码是这样的:

I am trying to use tensorflow to create a recurrent neural network. My code is something like this:

import tensorflow as tf

rnn_cell = tf.nn.rnn_cell.GRUCell(3)

inputs = [tf.constant([[0, 1]], dtype=tf.float32), tf.constant([[2, 3]], dtype=tf.float32)]

outputs, end = tf.nn.rnn(rnn_cell, inputs, dtype=tf.float32)

现在,一切正常.但是,我对实际发生的事情感到困惑.输出尺寸始终是批量大小x rnn单元格隐藏状态的尺寸-它们如何完全独立于输入尺寸?

Now, everything runs just fine. However, I am rather confused by what is actually going on. The output dimensions are always the batch size x the size of the rnn cell's hidden state - how can they be completely independent of the input size?

如果我的理解是正确的,则在每一步将输入连接到rnn的隐藏状态,然后乘以权重矩阵(以及其他操作).这意味着权重矩阵的尺寸需要取决于输入大小,这是不可能的,因为rnn_cell是在甚至声明输入之前创建的!

If my understanding is correct, the inputs are concatenated to the rnn's hidden state at each step, and then multiplied by a weight matrix (among other operations). This means that the dimensions of the weight matrix need to depend on the input size, which is impossible, because the rnn_cell is created before the inputs are even declared!

推荐答案

看到答案后关于张量流的问题GRU实施,我已经意识到发生了什么事.与我的直觉相反,GRUCell构造函数根本不创建任何权重或偏差变量.相反,它创建自己的变量作用域,然后在实际调用时按需实例化变量. Tensorflow的变量作用域机制可确保变量仅创建一次,并在后续对GRU的调用中共享.

After seeing the answer to a question about tensorflow's GRU implementation, I've realized what's going on. Counter to my intuition, the GRUCell constructor doesn't create any weight or bias variables at all. Instead, it creates its own variable scope, and then instantiates the variables on demand when actually called. Tensorflow's variable scoping mechanism ensures that the variables are only created once, and shared across subsequent calls to the GRU.

我不确定他们为什么决定采用这种相当混乱的实现方式,据我所知,这是没有记载的.对我来说,使用python的对象级变量作用域将tensorflow变量封装在GRUCell本身中似乎更合适,而不是依赖于其他隐式作用域机制.

I'm not sure why they decided to go with this rather confusing implementation, which is as far as I can tell is undocumented. To me it seems more appropriate to use python's object-level variable scoping to encapsulate the tensorflow variables within the GRUCell itself, rather than relying on an additional implicit scoping mechanism.

这篇关于Tensorflow RNN输入大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆