CNN中权重梯度的含义 [英] Meaning of Weight Gradient in CNN

查看:840
本文介绍了CNN中权重梯度的含义的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用MatConvNet开发了CNN,并且能够可视化第一层的权重。它看起来与此处显示的内容非常相似(如果我不够具体,请另附在下面)

解决方案

NN中的权重



在神经网络中,将一系列以矩阵表示的线性函数应用于特征(通常在它们之间具有非线性联合)。这些功能由飞镖中的值决定,称为权重。



您可以可视化正常神经网络的权重,但通常意味着与可视化cnn的卷积层。这些图层旨在学习整个空间的特征计算。



可视化权重时,您正在寻找图案。一个好的平滑过滤器可能意味着权重已得到很好的学习,并且特别地寻找某些东西。嘈杂的权重可视化可能意味着您对网络的训练不足,过拟合,需要更多正则化或其他有害的






重量梯度



可视化渐变意味着获取渐变矩阵并将其视为图像 [1] ,就像您将权重矩阵视为图像一样



梯度只是一个导数;






[1] 如果您引入体重分配,则可能会更加复杂,但不多


I developed a CNN using MatConvNet and am able to visualize the weights of the 1st layer. It looked very similar to what is shown here (also attached below incase I am not specific enough) http://cs.stanford.edu/people/karpathy/convnetjs/demo/cifar10.html

My question is, what are the weight gradients ? I'm not sure what those are and am unable to generate those...

解决方案

Weights in a NN

In a neural network, a series of linear functions represented as matrices are applied to features (usually with a nonlinear joint between them). These functions are determined by the values in the marices, referred to as weights.

You can visualize the weights of a normal neural network, but it usually means something slightly different to visualize the convolutional layers of a cnn. These layers are designed to learn a feature computation over the space.

When you visualize the weights, you're looking for patterns. A nice smooth filter may mean that the weights are well learned and "looking for something in particular". A noisy weight visualization may mean that you've undertrained your network, overfit it, need more regularization, or something else nefarious (a decent source for these claims).

From this decent review of weight visualizations, we can see patterns start to emerge from treating the weights as images:


Weight Gradients

"Visualizing the gradient" means taking the gradient matrix and treating like an image [1], just like you took the weight matrix and treated it like an image before.

A gradient is just a derivative; for images, it's usually computed as a finite difference - grossly simplified, the X gradient subtracts pixels next to each other in a row, and the Y gradient subtracts pixels next to each other in a column.

For the common example of a filter that extracts edges, we may see a strong gradient in a particular direction. By visualizing the gradients (taking the matrix of finite differences and treating it like an image), you can get a more immediate idea of how your filter is operating on the input. There are a lot of cutting edge techniques (eg, eg) for interpreting these results, but making the image pop up is the easy part!

A similar technique involves visualizing the activations after a forward pass over the input. In this case, you're looking at how the input was changed by the weights; by visualizing the weights, you're looking at how you expect them to change the input.

Don't over-think it - the weights are interesting because they let us see how the function behaves, and the gradients of the weights are just another feature to help explain what's going on. There's nothing sacred about that feature: here are some cool clustering features (t-SNE) from the google paper that look at space separability.


[1] It can be more complicated if you introduce weight sharing, but not that much

这篇关于CNN中权重梯度的含义的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆