损失与损失准确性-这些学习曲线是否合理? [英] Loss & accuracy - Are these reasonable learning curves?

查看:96
本文介绍了损失与损失准确性-这些学习曲线是否合理?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在学习神经网络,并在Keras中为UCI机器学习存储库中的虹膜数据集分类构建了一个简单的神经网络.我使用了一个带有8个隐藏节点的隐藏层网络. Adam优化器的学习速率为0.0005,运行200个纪元.在输出时使用Softmax,但其损失为分类交叉熵.我得到以下学习曲线.

I am learning neural networks and I built a simple one in Keras for the iris dataset classification from the UCI machine learning repository. I used a one hidden layer network with a 8 hidden nodes. Adam optimizer is used with a learning rate of 0.0005 and is run for 200 Epochs. Softmax is used at the output with loss as catogorical-crossentropy. I am getting the following learning curves.

如您所见,准确性的学习曲线有很多平坦的区域,我不明白为什么.误差似乎在不断减少,但准确性似乎并没有以相同的方式增加.精度学习曲线中的平坦区域意味着什么?为什么即使误差似乎在减小,这些区域的精度也没有增加?

As you can see, the learning curve for the accuracy has a lot of flat regions and I don't understand why. The error seems to be decreasing constantly but the accuracy doesn't seem to be increasing in the same manner. What does the flat regions in the accuracy learning curve imply? Why is the accuracy not increasing at those regions even though error seems to be decreasing?

这在训练中是正常现象吗,还是更有可能我在这里做错了事?

Is this normal in training or it is more likely that I am doing something wrong here?

dataframe = pd.read_csv("iris.csv", header=None)
dataset = dataframe.values
X = dataset[:,0:4].astype(float)
y = dataset[:,4]

scalar = StandardScaler()
X = scalar.fit_transform(X)

label_encoder = LabelEncoder()
y = label_encoder.fit_transform(y)

encoder = OneHotEncoder()
y = encoder.fit_transform(y.reshape(-1,1)).toarray()

# create model
model = Sequential()
model.add(Dense(8, input_dim=4, activation='relu'))
model.add(Dense(3, activation='softmax'))

# Compile model
adam = optimizers.Adam(lr=0.0005, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
model.compile(loss='categorical_crossentropy',
              optimizer=adam, 
              metrics=['accuracy'])

# Fit the model
log = model.fit(X, y, epochs=200, batch_size=5, validation_split=0.2)

fig = plt.figure()
fig.suptitle("Adam, lr=0.0006, one hidden layer")

ax = fig.add_subplot(1,2,1)
ax.set_title('Cost')
ax.plot(log.history['loss'], label='Training')
ax.plot(log.history['val_loss'], label='Validation')
ax.legend()

ax = fig.add_subplot(1,2,2)
ax.set_title('Accuracy')
ax.plot(log.history['acc'], label='Training')
ax.plot(log.history['val_acc'], label='Validation')
ax.legend()

fig.show()

推荐答案

损失准确性的实际含义(和机理)有一点了解这里有很多帮助(另请参阅我的我的答案,虽然我会重复使用某些部分)...

A little understanding of the actual meanings (and mechanics) of both loss and accuracy will be of much help here (refer also to this answer of mine, although I will reuse some parts)...

为简单起见,我将讨论限制在二进制分类的情况下,但是该思想通常适用;这是(物流)损失的等式:

For the sake of simplicity, I will limit the discussion to the case of binary classification, but the idea is generally applicable; here is the equation of the (logistic) loss:

  • y[i]是真实标签(0或1)
  • p[i]是预测([0,1]中的实数),通常解释为概率
  • output[i](方程式中未显示)是p[i]舍入,以便将它们也转换为0或1;正是这个数量进入了准确性的计算,隐含了一个阈值(对于二进制分类,通常为0.5),因此,如果p[i] > 0.5,则为output[i] = 1,否则为p[i] <= 0.5output[i] = 0.
  • y[i] are the true labels (0 or 1)
  • p[i] are the predictions (real numbers in [0,1]), usually interpreted as probabilities
  • output[i] (not shown in the equation) is the rounding of p[i], in order to convert them also to 0 or 1; it is this quantity that enters the calculation of accuracy, implicitly involving a threshold (normally at 0.5 for binary classification), so that if p[i] > 0.5, then output[i] = 1, otherwise if p[i] <= 0.5, output[i] = 0.

现在,让我们假设我们有一个真实的标签y[k] = 1,为此,在训练的早期,我们对p[k] = 0.1的预测比较差;然后,将数字插入上面的损耗方程:

Now, let's suppose that we have a true label y[k] = 1, for which, at an early point during training, we make a rather poor prediction of p[k] = 0.1; then, plugging the numbers to the loss equation above:

  • 该样本对损失的贡献为loss[k] = -log(0.1) = 2.3
  • p[k] < 0.5起,我们将得到output[k] = 0,因此它对准确性的贡献将为0(错误的分类)
  • the contribution of this sample to the loss, is loss[k] = -log(0.1) = 2.3
  • since p[k] < 0.5, we'll have output[k] = 0, hence its contribution to the accuracy will be 0 (wrong classification)

现在假设,在接下来的培训步骤中,我们的确确实在进步,并且得到了p[k] = 0.22;现在我们有:

Suppose now that, an the next training step, we are getting better indeed, and we get p[k] = 0.22; now we have:

  • loss[k] = -log(0.22) = 1.51
  • 由于它仍然是p[k] < 0.5,因此我们又将分类错误(output[k] = 0)归为零,
  • loss[k] = -log(0.22) = 1.51
  • since it still is p[k] < 0.5, we have again a wrong classification (output[k] = 0) with zero contribution to the accuracy

希望您开始明白这个主意,但让我们再看一个快照,例如,我们得到p[k] = 0.49;然后:

Hopefully you start getting the idea, but let's see one more later snapshot, where we get, say, p[k] = 0.49; then:

  • loss[k] = -log(0.49) = 0.71
  • 仍然output[k] = 0,即分类错误,对准确性的贡献为零
  • loss[k] = -log(0.49) = 0.71
  • still output[k] = 0, i.e. wrong classification with zero contribution to the accuracy

如您所见,我们的分类器确实在此特定样本中有所改进,即从2.3的损失变为1.5的损失,再到0.71的损失,但准确度仍未显示出这种改善,仅对有意义正确的分类:从准确性的角度来看,只要我们的p[k]估计值保持在0.5的阈值以下,我们就可以获得更好的估计值.

As you can see, our classifier indeed got better in this particular sample, i.e. it went from a loss of 2.3 to 1.5 to 0.71, but this improvement has still not shown up in the accuracy, which cares only for correct classifications: from an accuracy viewpoint, it doesn't matter that we get better estimates for our p[k], as long as these estimates remain below the threshold of 0.5.

当我们的p[k]超过阈值0.5时,损耗一直到现在为止都平稳地减少,但是现在我们将这个样本的精度贡献从0跃升到1/n,其中n是样本总数.

The moment our p[k] exceeds the threshold of 0.5, the loss continues to decrease smoothly as it has been so far, but now we have a jump in the accuracy contribution of this sample from 0 to 1/n, where n is the total number of samples.

同样,您可以自己确认,一旦我们的p[k]超过0.5,从而给出了正确的分类(并且现在对准确性做出了积极的贡献),它的进一步改进(即接近1.0)仍然继续减少损耗,但对精度没有进一步的影响.

Similarly, you can confirm by yourself that, once our p[k] has exceeded 0.5, hence giving a correct classification (and now contributing positively to the accuracy), further improvements of it (i.e getting closer to 1.0) still continue to decrease the loss, but have no further impact to the accuracy.

对于真实标签y[m] = 0p[m]的相应估计值始于0.5阈值以上的情况,类似的参数成立.并且即使p[m]初始估计值低于0.5(因此提供正确的分类并且已经对准确性产生了积极的影响),它们对0.0的收敛将减少损失,而不会进一步提高准确性.

Similar arguments hold for cases where the true label y[m] = 0 and the corresponding estimates for p[m] start somewhere above the 0.5 threshold; and even if p[m] initial estimates are below 0.5 (hence providing correct classifications and already contributing positively to the accuracy), their convergence towards 0.0 will decrease the loss without improving the accuracy any further.

将各个部分放在一起,希望您现在可以说服自己,平滑减少的损失和更加逐步"的增加准确性不仅是不兼容的,而且确实是很合理的.

Putting the pieces together, hopefully you can now convince yourself that a smoothly decreasing loss and a more "stepwise" increasing accuracy not only are not incompatible, but they make perfect sense indeed.

从更广义的角度来看:从数学优化的严格角度来看,没有所谓的准确性"之类的东西,只有损失;准确性仅从 business 角度进入讨论范围(并且不同的业务逻辑甚至可能会要求不同于默认值0.5的阈值).引用我自己的链接的答案:

On a more general level: from the strict perspective of mathematical optimization, there is no such thing called "accuracy" - there is only the loss; accuracy gets into the discussion only from a business perspective (and a different business logic might even call for a threshold different than the default 0.5). Quoting from my own linked answer:

损失和准确性是不同的东西;粗略地说,从业务的角度来看,准确性是我们真正感兴趣的,而损失则是学习算法(优化器)试图从数学最小化的目标函数. em>观点.更粗略地说,您可以将损失视为业务目标(准确性)对数学域的转换",这是分类问题中必不可少的转换(在回归问题中,损失和业务目标通常是损失).相同,或者至少在原理上可以相同,例如RMSE)...

Loss and accuracy are different things; roughly speaking, the accuracy is what we are actually interested in from a business perspective, while the loss is the objective function that the learning algorithms (optimizers) are trying to minimize from a mathematical perspective. Even more roughly speaking, you can think of the loss as the "translation" of the business objective (accuracy) to the mathematical domain, a translation which is necessary in classification problems (in regression ones, usually the loss and the business objective are the same, or at least can be the same in principle, e.g. the RMSE)...

这篇关于损失与损失准确性-这些学习曲线是否合理?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆