了解TensorBoard(权重)直方图 [英] Understanding TensorBoard (weight) histograms

查看:223
本文介绍了了解TensorBoard(权重)直方图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在TensorBoard中查看和理解标量值确实非常简单.但是,尚不清楚如何理解直方图.

It is really straightforward to see and understand the scalar values in TensorBoard. However, it's not clear how to understand histogram graphs.

例如,它们是我的网络权重的直方图.

For example, they are the histograms of my network weights.

(由于sunside修复了错误之后) 解释这些的最佳方法是什么?第1层的权重看起来基本上是平坦的,这是什么意思?

(After fixing a bug thanks to sunside) What is the best way to interpret these? Layer 1 weights look mostly flat, what does this mean?

我在此处添加了网络构建代码.

I added the network construction code here.

X = tf.placeholder(tf.float32, [None, input_size], name="input_x")
x_image = tf.reshape(X, [-1, 6, 10, 1])
tf.summary.image('input', x_image, 4)

# First layer of weights
with tf.name_scope("layer1"):
    W1 = tf.get_variable("W1", shape=[input_size, hidden_layer_neurons],
                         initializer=tf.contrib.layers.xavier_initializer())
    layer1 = tf.matmul(X, W1)
    layer1_act = tf.nn.tanh(layer1)
    tf.summary.histogram("weights", W1)
    tf.summary.histogram("layer", layer1)
    tf.summary.histogram("activations", layer1_act)

# Second layer of weights
with tf.name_scope("layer2"):
    W2 = tf.get_variable("W2", shape=[hidden_layer_neurons, hidden_layer_neurons],
                         initializer=tf.contrib.layers.xavier_initializer())
    layer2 = tf.matmul(layer1_act, W2)
    layer2_act = tf.nn.tanh(layer2)
    tf.summary.histogram("weights", W2)
    tf.summary.histogram("layer", layer2)
    tf.summary.histogram("activations", layer2_act)

# Third layer of weights
with tf.name_scope("layer3"):
    W3 = tf.get_variable("W3", shape=[hidden_layer_neurons, hidden_layer_neurons],
                         initializer=tf.contrib.layers.xavier_initializer())
    layer3 = tf.matmul(layer2_act, W3)
    layer3_act = tf.nn.tanh(layer3)

    tf.summary.histogram("weights", W3)
    tf.summary.histogram("layer", layer3)
    tf.summary.histogram("activations", layer3_act)

# Fourth layer of weights
with tf.name_scope("layer4"):
    W4 = tf.get_variable("W4", shape=[hidden_layer_neurons, output_size],
                         initializer=tf.contrib.layers.xavier_initializer())
    Qpred = tf.nn.softmax(tf.matmul(layer3_act, W4)) # Bug fixed: Qpred = tf.nn.softmax(tf.matmul(layer3, W4))
    tf.summary.histogram("weights", W4)
    tf.summary.histogram("Qpred", Qpred)

# We need to define the parts of the network needed for learning a policy
Y = tf.placeholder(tf.float32, [None, output_size], name="input_y")
advantages = tf.placeholder(tf.float32, name="reward_signal")

# Loss function
# Sum (Ai*logp(yi|xi))
log_lik = -Y * tf.log(Qpred)
loss = tf.reduce_mean(tf.reduce_sum(log_lik * advantages, axis=1))
tf.summary.scalar("Q", tf.reduce_mean(Qpred))
tf.summary.scalar("Y", tf.reduce_mean(Y))
tf.summary.scalar("log_likelihood", tf.reduce_mean(log_lik))
tf.summary.scalar("loss", loss)

# Learning
train = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)

推荐答案

网络似乎没有从第一层到第三层学到任何东西.最后一层确实发生了变化,这意味着渐变可能存在问题(如果您手动进行了篡改),则只能通过优化其权重来限制学习最后一层,或者实际上是优化最后一层.吞噬所有的错误.也可能是只有偏见是可以学习的.虽然网络似乎在学习一些东西,但它可能没有充分利用其潜力.这里需要更多的上下文,但是尝试学习率(例如使用较小的上下文)可能值得一试.

It appears that the network hasn't learned anything in the layers one to three. The last layer does change, so that means that there either may be something wrong with the gradients (if you're tampering with them manually), you're constraining learning to the last layer by optimizing only its weights or the last layer really 'eats up' all error. It could also be that only biases are learned. The network appears to learn something though, but it might not be using its full potential. More context would be needed here, but playing around with the learning rate (e.g. using a smaller one) might be worth a shot.

通常,直方图显示一个值相对于其他值的出现次数.简而言之,如果可能的值在0..9的范围内,并且您看到在值0上出现量10的尖峰,则表示10个输入采用值0;相反,如果直方图显示所有0..9值的1平稳段,则意味着对于10个输入,每个可能的值0..9准确地出现 一次. 当您通过直方图的总和对所有直方图值进行归一化时,也可以使用直方图来可视化概率分布.如果这样做,您将直观地获得某个值(在x轴上)出现的可能性(与其他输入相比).

In general, histograms display the number of occurrences of a value relative to each other values. Simply speaking, if the possible values are in a range of 0..9 and you see a spike of amount 10 on the value 0, this means that 10 inputs assume the value 0; in contrast, if the histogram shows a plateau of 1 for all values of 0..9, it means that for 10 inputs, each possible value 0..9 occurs exactly once. You can also use histograms to visualize probability distributions when you normalize all histogram values by their total sum; if you do that, you'll intuitively obtain the likelihood with which a certain value (on the x axis) will appear (compared to other inputs).

对于layer1/weights,现在的稳定期意味着:

Now for layer1/weights, the plateau means that:

  • 大多数权重在-0.15到0.15之间
  • (几乎)权重具有这些值中的任何一个的可能性,即它们(几乎)是均匀分布的

用不同的方式讲,几乎相同数量的权重具有值-0.150.00.15以及介于两者之间的所有值.有些权重的值会稍微更小或更高. 简而言之,这看起来就像权重已使用均值为零且取值范围为-0.15..0.15 ...给定或接受的均匀分布进行了初始化.如果确实使用统一初始化,那么这在尚未训练网络的情况下很典型.

Said differently, almost the same number of weights have the values -0.15, 0.0, 0.15 and everything in between. There are some weights having slightly smaller or higher values. So in short, this simply looks like the weights have been initialized using a uniform distribution with zero mean and value range -0.15..0.15 ... give or take. If you do indeed use uniform initialization, then this is typical when the network has not been trained yet.

相比之下,layer1/activations形成类似钟形曲线(高斯)的形状:值以特定值为中心,在这种情况下为0,但它们也可以大于或小于该值(同样可能,因为它是对称的).大多数值在0的平均值附近出现,但是值的范围从-0.80.8. 我假设layer1/activations被视为批处理中所有层输出的分布.您会看到这些值确实会随着时间而改变.

In comparison, layer1/activations forms a bell curve (gaussian)-like shape: The values are centered around a specific value, in this case 0, but they may also be greater or smaller than that (equally likely so, since it's symmetric). Most values appear close around the mean of 0, but values do range from -0.8 to 0.8. I assume that the layer1/activations is taken as the distribution over all layer outputs in a batch. You can see that the values do change over time.

第4层直方图没有告诉我任何具体信息.从形状来看,这只是表明-0.10.050.25附近的一些权重值倾向于出现的可能性较高; 可能的原因是,每个神经元的不同部分实际上收集了相同的信息,并且基本上是多余的.这可能意味着您实际上可以使用较小的网络,或者您的网络有可能学习更多的区别功能,以防止过度拟合.这些只是假设.

The layer 4 histogram doesn't tell me anything specific. From the shape, it's just showing that some weight values around -0.1, 0.05 and 0.25 tend to be occur with a higher probability; a reason could be, that different parts of each neuron there actually pick up the same information and are basically redundant. This can mean that you could actually use a smaller network or that your network has the potential to learn more distinguishing features in order to prevent overfitting. These are just assumptions though.

此外,正如下面的评论中已经提到的,确实要添加偏差单位.通过将它们排除在外,您正在将网络强制约束为可能无效的解决方案.

Also, as already stated in the comments below, do add bias units. By leaving them out, you are forcefully constraining your network to a possibly invalid solution.

这篇关于了解TensorBoard(权重)直方图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆