成本函数训练目标与准确性目标 [英] Cost function training target versus accuracy desired goal

查看:86
本文介绍了成本函数训练目标与准确性目标的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我们训练神经网络时,我们通常使用梯度下降,这依赖于连续的,可微分的实值成本函数.例如,最终成本函数可能采用均方误差.或换种说法,梯度下降隐式地假设最终目标是回归-以最大程度地减少实值误差度量.

When we train neural networks, we typically use gradient descent, which relies on a continuous, differentiable real-valued cost function. The final cost function might, for example, take the mean squared error. Or put another way, gradient descent implicitly assumes the end goal is regression - to minimize a real-valued error measure.

有时候,我们希望神经网络要做的是执行分类-给定输入,将其分类为两个或多个离散类别.在这种情况下,用户关心的最终目标是分类的准确性-正确分类的案例所占的百分比.

Sometimes what we want a neural network to do is perform classification - given an input, classify it into two or more discrete categories. In this case, the end goal the user cares about is classification accuracy - the percentage of cases classified correctly.

但是,当我们使用神经网络进行分类时,尽管我们的目标是分类准确度,但这并不是神经网络试图优化的目标.神经网络仍在尝试优化实值成本函数.有时这些指向相同的方向,但有时却并非如此.特别是,我一直在遇到这样的情况:经过训练以正确最小化成本函数的神经网络具有比简单的手工编码阈值比较差的分类精度.

But when we are using a neural network for classification, though our goal is classification accuracy, that is not what the neural network is trying to optimize. The neural network is still trying to optimize the real-valued cost function. Sometimes these point in the same direction, but sometimes they don't. In particular, I've been running into cases where a neural network trained to correctly minimize the cost function, has a classification accuracy worse than a simple hand-coded threshold comparison.

我已经使用TensorFlow将其简化为一个最小的测试用例.它建立一个感知器(无隐藏层的神经网络),在绝对最小的数据集(一个输入变量,一个二进制输出变量)上训练它,评估结果的分类准确性,然后将其与简单手的分类准确性进行比较编码的阈值比较;结果分别为60%和80%.直观地讲,这是因为具有大输入值的单个离群值会产生相应的大输出值,因此,将成本函数最小化的方法是,在对两种以上普通情况进行错误分类的过程中,要更加努力地适应这种情况.感知器正确地执行了被告知要执行的操作;只是这与我们实际想要的分类器不符.但是分类精度并不是一个连续的可微函数,因此我们不能将其用作梯度下降的目标.

I've boiled this down to a minimal test case using TensorFlow. It sets up a perceptron (neural network with no hidden layers), trains it on an absolutely minimal dataset (one input variable, one binary output variable) assesses the classification accuracy of the result, then compares it to the classification accuracy of a simple hand-coded threshold comparison; the results are 60% and 80% respectively. Intuitively, this is because a single outlier with a large input value, generates a correspondingly large output value, so the way to minimize the cost function is to try extra hard to accommodate that one case, in the process misclassifying two more ordinary cases. The perceptron is correctly doing what it was told to do; it's just that this does not match what we actually want of a classifier. But the classification accuracy is not a continuous differentiable function, so we can't use it as the target for gradient descent.

我们如何训练神经网络,从而最终使分类精度最大化?

How can we train a neural network so that it ends up maximizing classification accuracy?

import numpy as np
import tensorflow as tf
sess = tf.InteractiveSession()
tf.set_random_seed(1)

# Parameters
epochs = 10000
learning_rate = 0.01

# Data
train_X = [
    [0],
    [0],
    [2],
    [2],
    [9],
]
train_Y = [
    0,
    0,
    1,
    1,
    0,
]

rows = np.shape(train_X)[0]
cols = np.shape(train_X)[1]

# Inputs and outputs
X = tf.placeholder(tf.float32)
Y = tf.placeholder(tf.float32)

# Weights
W = tf.Variable(tf.random_normal([cols]))
b = tf.Variable(tf.random_normal([]))

# Model
pred = tf.tensordot(X, W, 1) + b
cost = tf.reduce_sum((pred-Y)**2/rows)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
tf.global_variables_initializer().run()

# Train
for epoch in range(epochs):
    # Print update at successive doublings of time
    if epoch&(epoch-1) == 0 or epoch == epochs-1:
        print('{} {} {} {}'.format(
            epoch,
            cost.eval({X: train_X, Y: train_Y}),
            W.eval(),
            b.eval(),
            ))
    optimizer.run({X: train_X, Y: train_Y})

# Classification accuracy of perceptron
classifications = [pred.eval({X: x}) > 0.5 for x in train_X]
correct = sum([p == y for (p, y) in zip(classifications, train_Y)])
print('{}/{} = perceptron accuracy'.format(correct, rows))

# Classification accuracy of hand-coded threshold comparison
classifications = [x[0] > 1.0 for x in train_X]
correct = sum([p == y for (p, y) in zip(classifications, train_Y)])
print('{}/{} = threshold accuracy'.format(correct, rows))

推荐答案

我仍然不确定这是否是一个恰当的问题,更不用说适合SO了.不过,我会尝试一下,也许您会发现我的答案中至少有一些内容会有所帮助.

I am still not sure if this is a well-posed question, let alone appropriate for SO; nevertheless, I'll give it a try, and maybe you will find at least some elements of my answer helpful.

我们如何训练神经网络,从而最终使分类精度最大化?

How can we train a neural network so that it ends up maximizing classification accuracy?

我正在寻求一种方法来获得更接近准确性的连续代理功能

I'm asking for a way to get a continuous proxy function that's closer to the accuracy

首先,当今在(深)神经网络中用于分类任务的损失函数并不是随其发明的,但它可以追溯到几十年前,而实际上它来自逻辑回归的早期.这是二进制分类的简单情况的等式:

To start with, the loss function used today for classification tasks in (deep) neural nets was not invented with them, but it goes back several decades, and it actually comes from the early days of logistic regression. Here is the equation for the simple case of binary classification:

其背后的想法正是想出一个 continuous& 函数,这样我们就可以利用凸优化的(庞大且仍在扩展中)的武器库来解决分类问题.

The idea behind it was exactly to come up with a continuous & differentiable function, so that we would be able to exploit the (vast, and still expanding) arsenal of convex optimization for classification problems.

可以肯定地说,鉴于上述所需的数学约束,上述损失函数是迄今为止我们迄今为止最好的函数.

It is safe to say that the above loss function is the best we have so far, given the desired mathematical constraints mentioned above.

我们应该考虑解决这个问题(即更好地近似精度)吗?至少原则上没有.我年纪大了,想起一个时代,当时只有可用的激活功能是tanhsigmoid.然后来到ReLU,为该领域带来了真正的推动.同样,某人最终可能会提出更好的损失函数,但是可以说这将在研究论文中发生,而不是作为SO问题的答案...

Should we consider this problem (i.e. better approximating the accuracy) solved and finished? At least in principle, no. I am old enough to remember an era when the only activation functions practically available were tanh and sigmoid; then came ReLU and gave a real boost to the field. Similarly, someone may eventually come up with a better loss function, but arguably this is going to happen in a research paper, and not as an answer to a SO question...

也就是说,当前损失函数实际上来自概率和信息理论的非常基本的考虑(与当前深度学习领域形成鲜明对比的领域是坚定的理论基础基金会)至少对是否有一个更好的损失建议就在眼前产生了疑问.

That said, the very fact that the current loss function comes from very elementary considerations of probability and information theory (fields that, in sharp contrast with the current field of deep learning, stand upon firm theoretical foundations) creates at least some doubt as to if a better proposal for the loss may be just around the corner.

关于损失与准确性之间的关系还有另一个微妙的要点,这使得后者与前者在质量上有所不同,并且经常在这种讨论中被忽略.让我详细说明一下...

There is another subtle point on the relation between loss and accuracy, which makes the latter something qualitatively different than the former, and is frequently lost in such discussions. Let me elaborate a little...

与该讨论相关的所有分类器(即神经网络,逻辑回归等)都是概率分类器;也就是说,它们不返回硬类成员资格(0/1),而是返回类概率([0,1]中的连续实数).

All the classifiers related to this discussion (i.e. neural nets, logistic regression etc) are probabilistic ones; that is, they do not return hard class memberships (0/1) but class probabilities (continuous real numbers in [0, 1]).

为了简化讨论,只考虑了二进制情况,将类概率转换为(硬)类成员时,我们隐式地涉及一个阈值,通常等于0.5,例如,然后按class[i] = "1".现在,我们发现很多情况下,这种天真的默认阈值选择将不起作用(首先想到的是严重失衡的数据集),我们将不得不选择其他情况.但是,在这里我们讨论的重点是,该阈值选择虽然对准确性至关重要,但对于最小化损失的数学优化问题完全是外部,并且可以作为进一步的绝缘"层之间的层",从而简化了一个简单的观点,即损失只是准确性的代表(不是).

Limiting the discussion for simplicity to the binary case, when converting a class probability to a (hard) class membership, we are implicitly involving a threshold, usually equal to 0.5, such as if p[i] > 0.5, then class[i] = "1". Now, we can find many cases whet this naive default choice of threshold will not work (heavily imbalanced datasets are the first to come to mind), and we'll have to choose a different one. But the important point for our discussion here is that this threshold selection, while being of central importance to the accuracy, is completely external to the mathematical optimization problem of minimizing the loss, and serves as a further "insulation layer" between them, compromising the simplistic view that loss is just a proxy for accuracy (it is not).

扩大一些已经很广泛的讨论:我们是否可以完全摆脱连续和数学优化的(非常)限制约束.可区分的功能?换句话说,我们可以取消反向传播和梯度下降吗?

Enlarging somewhat an already broad discussion: Can we possibly move completely away from the (very) limiting constraint of mathematical optimization of continuous & differentiable functions? In other words, can we do away with back-propagation and gradient descend?

好吧,至少在强化学习的子领域中,我们实际上已经在这样做了:2017年是论文 ,再次在社区中产生了热情.

Well, we are actually doing so already, at least in the sub-field of reinforcement learning: 2017 was the year when new research from OpenAI on something called Evolution Strategies made headlines. And as an extra bonus, here is an ultra-fresh (Dec 2017) paper by Uber on the subject, again generating much enthusiasm in the community.

这些是我的想法,基于我对您问题的理解.正如我已经说过的那样,即使这种理解是不正确的,也希望您会在这里找到一些有用的元素...

These are my thoughts, based on my own understanding of your question. Even if this understanding is not correct, as I already said, hopefully you'll find some helpful elements here...

这篇关于成本函数训练目标与准确性目标的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆