如何确定用于图像分类的卷积神经网络的参数? [英] How do you decide the parameters of a Convolutional Neural Network for image classification?

查看:42
本文介绍了如何确定用于图像分类的卷积神经网络的参数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用卷积神经网络(无监督特征学习来检测特征 + Softmax 回归分类器)进行图像分类.我已经阅读了 Andrew NG 在这方面的所有教程.(

https://www.youtube.com/watch?v=n6hpQwq7Inw

在其中重复执行卷积(通常为 5x5 或 3x3),然后是最大池化(通常使用 2x2 池化窗口,但对于大型输入图像可能需要 4x4).

结论找到合适的网络布局的最佳方法实际上是执行试错测试.很多测试.没有一刀切的网络,只有您知道数据集的内在复杂性.执行必要测试数量的最有效方法是通过交叉验证.

I am using Convolutional Neural Networks (Unsupervised Feature learning to detect features + Softmax Regression Classifier) for image classification. I have gone through all the tutorials by Andrew NG in this area. (http://ufldl.stanford.edu/wiki/index.php/UFLDL_Tutorial).

The network that I have developed has an :

  • Input layer - size 8x8 (64 neurons)
  • Hidden layer - size 400 neurons
  • Output layer - size 3

I have learnt the weights connecting the input layer to the hidden layer using a sparse autoencoder and hence have 400 different features.

By taking continuous 8x8 patches from any input image (64x64) and feeding it to the input layer, I get 400 feature maps of size (57x57).

I then use max pooling with a window of size 19 x 19 to obtain 400 feature maps of size 3x3.

I feed this feature map to a softmax layer to classify it into 3 different categories.

These parameters such as the number of hidden layers (depth of the network), and number of neurons per layer, were suggested in the tutorials as they had successfully been used on one particular data-set where all images were of size 64x64.

I would like to extend this to my own data set where the images are much larger (say 400x400). How do I decide on

  1. The number of layers.

  2. The number of neurons per layer.

  3. The size of the pooling window (max pooling).

解决方案

The Number of hidden layers: The number of hidden layers required depends on the intrinsic complexity of your dataset, this can be understood by looking at what each layer achieves:

  • Zero hidden layers allow the network to model only a linear function. This is inadequate for most image recognition tasks.

  • One hidden layer allows the network to model an arbitrarily complex function. This is adequate for many image recognition tasks.

  • Theoretically, two hidden layers offer little benefit over a single layer, however, in practice some tasks may find an additional layer beneficial. This should be treated with caution, as a second layer can cause over-fitting. Using more than two hidden layers is almost never beneficial only beneficial for especially complex tasks, or when a very large amount of training data is available (updated based on Evgeni Sergeev comment).

To cut a long story short, if you have time then test both one and two hidden layers to see which achieves the most satisfactory results. If you do not have time then you should take a punt on a single hidden layer, and you will not go far wrong.

The Number of convolutional layers: In my experience, the more convolutional layers the better (within reason, as each convolutional layer reduces the number of input features to the fully connected layers), although after about two or three layers the accuracy gain becomes rather small so you need to decide whether your main focus is generalisation accuracy or training time. That said, all image recognition tasks are different so the best method is to simply try incrementing the number of convolutional layers one at a time until you are satisfied by the result.

The number of nodes per hidden layer: ...Yet again, there is no magic formula for deciding upon the number of nodes, it is different for each task. A rough guide to go by is to use a number of nodes 2/3 the size of the previous layer, with the first layer 2/3 the size of the final feature maps. This however is just a rough guide and depends again on the dataset. Another commonly used option is to start with an excessive number of nodes, then to remove the unnecessary nodes through pruning.

Max pooling window size: I have always applied max pooling straight after convolution so am perhaps not qualified to make suggestions on the window size you should use. That said, 19x19 max pooling seems overly severe since it literally throws most of your data away. Perhaps you should look at a more conventional LeNet network layout:

http://deeplearning.net/tutorial/lenet.html

https://www.youtube.com/watch?v=n6hpQwq7Inw

In which you repeatedly perform convolution(5x5 or 3x3 usually) followed by max pooling (usually with a 2x2 pooling window, although 4x4 can be necessary for large input images).

In Conclusion The best way to find a suitable network layout is literally to perform trial and error tests. Lots of tests. There is no one-size-fits-all network, and only you know the intrinsic complexity of your dataset. The most effective way of performing the number of necessary tests is through cross validation.

这篇关于如何确定用于图像分类的卷积神经网络的参数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆