为什么在MxNet中对标签进行归一化可以使准确性接近100%? [英] Why normalizing labels in MxNet makes accuracy close to 100%?

查看:150
本文介绍了为什么在MxNet中对标签进行归一化可以使准确性接近100%?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在MxNet(gluon api)上使用多标签逻辑回归训练模型,如下所述:

I am training a model using multi-label logistic regression on MxNet (gluon api) as described here: multi-label logit in gluon My custom dataset has 13 features and one label of shape [,6]. My features are normalized from original values to [0,1] I use simple dense neural net with 2 hidden layers.

我注意到,当我不对标签进行规范化时(标签的离散值分别为1,2,3,4,5,6,而纯粹是将分类值映射到这些数字的选择),我的训练过程逐渐收敛到一些最小值,例如:

I noticed when I don't normalize labels (which take discrete values of 1,2,3,4,5,6 and are purely my choice to map categorical values to these numbers), my training process slowly converges to some minima for example:

Epoch: 0, ela: 8.8 sec, Loss: 1.118188, Train_acc 0.5589, Test_acc 0.5716
Epoch: 1, ela: 9.6 sec, Loss: 0.916276, Train_acc 0.6107, Test_acc 0.6273
Epoch: 2, ela: 10.3 sec, Loss: 0.849386, Train_acc 0.6249, Test_acc 0.6421
Epoch: 3, ela: 9.2 sec, Loss: 0.828530, Train_acc 0.6353, Test_acc 0.6304
Epoch: 4, ela: 9.3 sec, Loss: 0.824667, Train_acc 0.6350, Test_acc 0.6456
Epoch: 5, ela: 9.3 sec, Loss: 0.817131, Train_acc 0.6375, Test_acc 0.6455
Epoch: 6, ela: 10.6 sec, Loss: 0.815046, Train_acc 0.6386, Test_acc 0.6333
Epoch: 7, ela: 9.4 sec, Loss: 0.811139, Train_acc 0.6377, Test_acc 0.6289
Epoch: 8, ela: 9.2 sec, Loss: 0.808038, Train_acc 0.6381, Test_acc 0.6484
Epoch: 9, ela: 9.2 sec, Loss: 0.806301, Train_acc 0.6405, Test_acc 0.6485
Epoch: 10, ela: 9.4 sec, Loss: 0.804517, Train_acc 0.6433, Test_acc 0.6354
Epoch: 11, ela: 9.1 sec, Loss: 0.803954, Train_acc 0.6389, Test_acc 0.6280
Epoch: 12, ela: 9.3 sec, Loss: 0.803837, Train_acc 0.6426, Test_acc 0.6495
Epoch: 13, ela: 9.1 sec, Loss: 0.801444, Train_acc 0.6424, Test_acc 0.6328
Epoch: 14, ela: 9.4 sec, Loss: 0.799847, Train_acc 0.6445, Test_acc 0.6380
Epoch: 15, ela: 9.1 sec, Loss: 0.795130, Train_acc 0.6454, Test_acc 0.6471

但是,当我对标签进行规范化并再次训练时,得到的有线结果在训练和测试中显示99.99%的准确性:

However, when I normalize labels and train again I get this wired result showing 99.99% accuracy on both training and testing:

Epoch: 0, ela: 12.3 sec, Loss: 0.144049, Train_acc 0.9999, Test_acc 0.9999
Epoch: 1, ela: 12.7 sec, Loss: 0.023632, Train_acc 0.9999, Test_acc 0.9999
Epoch: 2, ela: 12.3 sec, Loss: 0.013996, Train_acc 0.9999, Test_acc 0.9999
Epoch: 3, ela: 12.7 sec, Loss: 0.010092, Train_acc 0.9999, Test_acc 0.9999
Epoch: 4, ela: 12.7 sec, Loss: 0.007964, Train_acc 0.9999, Test_acc 0.9999
Epoch: 5, ela: 12.6 sec, Loss: 0.006623, Train_acc 0.9999, Test_acc 0.9999
Epoch: 6, ela: 12.6 sec, Loss: 0.005700, Train_acc 0.9999, Test_acc 0.9999
Epoch: 7, ela: 12.4 sec, Loss: 0.005026, Train_acc 0.9999, Test_acc 0.9999
Epoch: 8, ela: 12.6 sec, Loss: 0.004512, Train_acc 0.9999, Test_acc 0.9999

这怎么可能?为什么标准化标签会以这种方式影响训练准确性?

How is this possible? Why normalizing labels affects training accuracy in such way?

推荐答案

您链接到的教程进行了多类分类.在多标签分类中,示例标签是单热阵列.例如,标签[0 0 1 0]表示此示例属于类2(假设类以0开头).对该向量进行规范化是没有意义的,因为值已经在0到1之间.而且,在多类分类中,只有一个标签可以为true,而另一个标签必须为false.在多类分类中,0和1以外的值没有意义.

The tutorial you linked to does multiclass classification. In multilabel classification, label for an example is a one-hot array. For example label [0 0 1 0] means this example belongs to class 2 (assuming classes start with 0). Normalizing this vector does not make sense because the values are already between 0 and 1. Also, in multiclass classification, only one of the label can be true and the other have to be false. Values other than 0 and 1 do not make sense in multi class classification.

在代表一批示例时,通常将标签写为整数而不是热数组,以便于阅读.例如,标签[4 6 1 7]表示第一个示例属于类4,第二个示例属于类6,依此类推.规范化此表示形式也没有意义,因为此表示形式已在内部转换为一个热阵列.

When representing a batch of examples, it is common to write the labels as integers instead of on-hot arrays for easier readability. For example label [4 6 1 7] means the first example belongs to class 4, the second example belongs to class 6 and so on. Normalizing this representation also does not make sense because this representation is internally converted to one hot array.

现在,如果您标准化第二个表示形式,则该行为是不确定的,因为浮点不能是数组索引.可能发生奇怪的事情来使您达到99%的准确性.也许您将值归一化为0到1,并且生成的一键热阵列大多指向0类,很少指向1类.这可以为您提供99%的准确性.

Now, if you normalize the second representation, the behavior is undefined because floating points cannot be array indices. It is possible something weird is happening to give you the 99% accuracy. Maybe you normalized the values to 0 to 1 and the resulting one-hot arrays mostly points to class 0 and rarely class 1. That could give you a 99% accuracy.

我建议不要将标签标准化.

I would suggest to not normalize the labels.

这篇关于为什么在MxNet中对标签进行归一化可以使准确性接近100%?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆