为什么(受MNIST训练的)模型不擅长图片中不在画面中央的数字 [英] Why (MNIST trained) model is not good at digits which not in the center of the picture

查看:99
本文介绍了为什么(受MNIST训练的)模型不擅长图片中不在画面中央的数字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的CNN模型在MNIST数据集上的准确性高达99.4%.所以我尝试一些不规则的输入.并且预测结果不正确.

My CNN model has accuracy up to 99.4% on the MNIST dataset. So I try some irregular input. And the predicted result is not correct.

以下是我使用的一些不规则输入

The following are some of the irregular input I use

我们知道,CNN卷积将扫描整个图像,也不必关心图像哪个区域的关键特征.

As we know, CNN convolution will scan the whole image, also don't care about the key features in which areas of the image.

为什么CNN无法处理不规则的输入

推荐答案

我们知道,CNN卷积将扫描整个图像,也不必关心图像哪些区域的关键特征.

As we know, CNN convolution will scan the whole image, also don't care about the key features in which areas of the image.

这完全是错误的. CNN不会扫描"图像,单个过滤器可以看作是扫描,但整个网络都不会. CNN由许多层组成,这将最终减少信息量,并且在某些时候还使用特定于位置的功能(在最终的完全连接层中,在某些全局平均值中,等等).因此,尽管CNN能够抵抗小的扰动(平移或噪声,但不能旋转!),但它们对这些变换并不是不变的.换句话说-将图像向左移动3个像素是可以的,但是尝试以完全不同的比例/位置对数字进行分类将失败,因为没有任何因素迫使您的模型保持不变.确实学习了这种不变性的某些模型是 Spatial Transformers Networks ,而CNN却根本没有.

This is simply false. CNN do not "scan" image, a single filter can be seen as scanning, but the whole network does not. CNN is composed of many layers, which will eventually reduce amount of information, and at some point also use location-specific feature (in final fully connected layers, in some global averaging and so on). Consequently, while CNNs are robust to small perturbations (translations or noise, but not rotations!), they are not invariant to these transformations. In other words - moving an image 3 pixels to the left is fine, but trying to classify a number in completely different scale/position will fail because there is nothing forcing your model to be invariant to that. Some models that indeed learn these kind of invariances are Spatial Transformers Networks, but CNNs simply don't.

这篇关于为什么(受MNIST训练的)模型不擅长图片中不在画面中央的数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆