了解 TensorBoard(权重)直方图 [英] Understanding TensorBoard (weight) histograms

查看:35
本文介绍了了解 TensorBoard(权重)直方图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 TensorBoard 中查看和理解标量值非常简单.但是,尚不清楚如何理解直方图.

It is really straightforward to see and understand the scalar values in TensorBoard. However, it's not clear how to understand histogram graphs.

例如,它们是我的网络权重的直方图.

For example, they are the histograms of my network weights.

(由于 sunside 修复了一个错误后)解释这些的最佳方式是什么?第 1 层权重看起来大部分是平坦的,这是什么意思?

(After fixing a bug thanks to sunside) What is the best way to interpret these? Layer 1 weights look mostly flat, what does this mean?

我在这里添加了网络构建代码.

I added the network construction code here.

X = tf.placeholder(tf.float32, [None, input_size], name="input_x")
x_image = tf.reshape(X, [-1, 6, 10, 1])
tf.summary.image('input', x_image, 4)

# First layer of weights
with tf.name_scope("layer1"):
    W1 = tf.get_variable("W1", shape=[input_size, hidden_layer_neurons],
                         initializer=tf.contrib.layers.xavier_initializer())
    layer1 = tf.matmul(X, W1)
    layer1_act = tf.nn.tanh(layer1)
    tf.summary.histogram("weights", W1)
    tf.summary.histogram("layer", layer1)
    tf.summary.histogram("activations", layer1_act)

# Second layer of weights
with tf.name_scope("layer2"):
    W2 = tf.get_variable("W2", shape=[hidden_layer_neurons, hidden_layer_neurons],
                         initializer=tf.contrib.layers.xavier_initializer())
    layer2 = tf.matmul(layer1_act, W2)
    layer2_act = tf.nn.tanh(layer2)
    tf.summary.histogram("weights", W2)
    tf.summary.histogram("layer", layer2)
    tf.summary.histogram("activations", layer2_act)

# Third layer of weights
with tf.name_scope("layer3"):
    W3 = tf.get_variable("W3", shape=[hidden_layer_neurons, hidden_layer_neurons],
                         initializer=tf.contrib.layers.xavier_initializer())
    layer3 = tf.matmul(layer2_act, W3)
    layer3_act = tf.nn.tanh(layer3)

    tf.summary.histogram("weights", W3)
    tf.summary.histogram("layer", layer3)
    tf.summary.histogram("activations", layer3_act)

# Fourth layer of weights
with tf.name_scope("layer4"):
    W4 = tf.get_variable("W4", shape=[hidden_layer_neurons, output_size],
                         initializer=tf.contrib.layers.xavier_initializer())
    Qpred = tf.nn.softmax(tf.matmul(layer3_act, W4)) # Bug fixed: Qpred = tf.nn.softmax(tf.matmul(layer3, W4))
    tf.summary.histogram("weights", W4)
    tf.summary.histogram("Qpred", Qpred)

# We need to define the parts of the network needed for learning a policy
Y = tf.placeholder(tf.float32, [None, output_size], name="input_y")
advantages = tf.placeholder(tf.float32, name="reward_signal")

# Loss function
# Sum (Ai*logp(yi|xi))
log_lik = -Y * tf.log(Qpred)
loss = tf.reduce_mean(tf.reduce_sum(log_lik * advantages, axis=1))
tf.summary.scalar("Q", tf.reduce_mean(Qpred))
tf.summary.scalar("Y", tf.reduce_mean(Y))
tf.summary.scalar("log_likelihood", tf.reduce_mean(log_lik))
tf.summary.scalar("loss", loss)

# Learning
train = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)

推荐答案

看起来网络在第一到第三层没有学到任何东西.最后一层确实发生了变化,这意味着梯度可能有问题(如果您手动篡改它们),您通过仅优化其权重或最后一层来限制学习到最后一层'吃掉'所有的错误.也可能只学习了偏见.虽然网络似乎学习了一些东西,但它可能没有充分利用其潜力.此处需要更多上下文,但调整学习率(例如使用较小的学习率)可能值得一试.

It appears that the network hasn't learned anything in the layers one to three. The last layer does change, so that means that there either may be something wrong with the gradients (if you're tampering with them manually), you're constraining learning to the last layer by optimizing only its weights or the last layer really 'eats up' all error. It could also be that only biases are learned. The network appears to learn something though, but it might not be using its full potential. More context would be needed here, but playing around with the learning rate (e.g. using a smaller one) might be worth a shot.

通常,直方图显示一个值相对于其他值的出现次数.简单地说,如果可能的值在 0..9 的范围内,并且您看到 10 值出现在 0 的范围内,这意味着 10 个输入假定值 0;相反,如果直方图对于 0..9 的所有值都显示 1 的平稳状态,则意味着对于 10 个输入,每个可能的值 0..9恰好发生一次.当您按总和对所有直方图值进行归一化时,您还可以使用直方图来可视化概率分布;如果这样做,您将直观地获得某个值(在 x 轴上)出现的可能性(与其他输入相比).

In general, histograms display the number of occurrences of a value relative to each other values. Simply speaking, if the possible values are in a range of 0..9 and you see a spike of amount 10 on the value 0, this means that 10 inputs assume the value 0; in contrast, if the histogram shows a plateau of 1 for all values of 0..9, it means that for 10 inputs, each possible value 0..9 occurs exactly once. You can also use histograms to visualize probability distributions when you normalize all histogram values by their total sum; if you do that, you'll intuitively obtain the likelihood with which a certain value (on the x axis) will appear (compared to other inputs).

现在对于layer1/weights,高原意味着:

Now for layer1/weights, the plateau means that:

  • 大多数权重在 -0.15 到 0.15 的范围内
  • 权重具有这些值中的任何一个的可能性(几乎)相等,即它们(几乎)均匀分布

换句话说,几乎相同数量的权重具有值 -0.150.00.15 以及介于两者之间的所有值.有一些权重值略小或略高.简而言之,这看起来像是使用均值为零且值范围 -0.15..0.15 的均匀分布来初始化权重......给予或接受.如果您确实使用了统一初始化,那么这是网络尚未训练时的典型情况.

Said differently, almost the same number of weights have the values -0.15, 0.0, 0.15 and everything in between. There are some weights having slightly smaller or higher values. So in short, this simply looks like the weights have been initialized using a uniform distribution with zero mean and value range -0.15..0.15 ... give or take. If you do indeed use uniform initialization, then this is typical when the network has not been trained yet.

相比之下,layer1/activations 形成了类似钟形曲线(高斯)的形状:这些值以特定值为中心,在本例中为 0,但它们也可能大于或小于(同样可能如此,因为它是对称的).大多数值看起来接近 0 的平均值,但值的范围确实从 -0.80.8.我假设 layer1/activations 被视为批处理中所有层输出的分布.您可以看到这些值确实会随着时间发生变化.

In comparison, layer1/activations forms a bell curve (gaussian)-like shape: The values are centered around a specific value, in this case 0, but they may also be greater or smaller than that (equally likely so, since it's symmetric). Most values appear close around the mean of 0, but values do range from -0.8 to 0.8. I assume that the layer1/activations is taken as the distribution over all layer outputs in a batch. You can see that the values do change over time.

第 4 层直方图没有告诉我任何具体信息.从形状上看,它只是表明-0.10.050.25附近的一些权重值往往以更高的概率出现;一个原因可能是,每个神经元的不同部分实际上获取了相同的信息并且基本上是冗余的.这可能意味着您实际上可以使用更小的网络,或者您的网络有可能学习更多的区别特征以防止过度拟合.不过这些都只是假设.

The layer 4 histogram doesn't tell me anything specific. From the shape, it's just showing that some weight values around -0.1, 0.05 and 0.25 tend to be occur with a higher probability; a reason could be, that different parts of each neuron there actually pick up the same information and are basically redundant. This can mean that you could actually use a smaller network or that your network has the potential to learn more distinguishing features in order to prevent overfitting. These are just assumptions though.

此外,正如在下面的评论中所述,请添加偏置单元.将它们排除在外,您将网络强制限制为可能无效的解决方案.

Also, as already stated in the comments below, do add bias units. By leaving them out, you are forcefully constraining your network to a possibly invalid solution.

这篇关于了解 TensorBoard(权重)直方图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆