神经网络中的训练、验证和测试集之间有什么区别? [英] What's is the difference between train, validation and test set, in neural networks?

查看:30
本文介绍了神经网络中的训练、验证和测试集之间有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用这个库来实现一个学习代理.

I'm using this library to implement a learning agent.

我已经生成了训练案例,但我不确定验证集和测试集是什么.
老师说:

I have generated the training cases, but I don't know for sure what the validation and test sets are.
The teacher says:

70% 应该是训练用例,10% 应该是测试用例,剩下的 20% 应该是验证用例.

70% should be train cases, 10% will be test cases and the rest 20% should be validation cases.

编辑

我有这个用于训练的代码,但我不知道何时停止训练.

I have this code for training, but I have no idea when to stop training.

  def train(self, train, validation, N=0.3, M=0.1):
    # N: learning rate
    # M: momentum factor
    accuracy = list()
    while(True):
        error = 0.0
        for p in train:
            input, target = p
            self.update(input)
            error = error + self.backPropagate(target, N, M)
        print "validation"
        total = 0
        for p in validation:
            input, target = p
            output = self.update(input)
            total += sum([abs(target - output) for target, output in zip(target, output)]) #calculates sum of absolute diference between target and output

        accuracy.append(total)
        print min(accuracy)
        print sum(accuracy[-5:])/5
        #if i % 100 == 0:
        print 'error %-14f' % error
        if ? < ?:
            break

编辑

验证数据的平均误差为 0.2,经过 20 次训练迭代后,应该是 80%?

I can get an average error of 0.2 with validation data, after maybe 20 training iterations, that should be 80%?

平均误差 = 验证目标和输出之间的绝对差异之和,给定验证数据输入/验证数据的大小.

average error = sum of absolute difference between validation target and output, given the validation data input/size of validation data.

1
        avg error 0.520395 
        validation
        0.246937882684
2
        avg error 0.272367   
        validation
        0.228832420879
3
        avg error 0.249578    
        validation
        0.216253590304
        ...
22
        avg error 0.227753
        validation
        0.200239244714
23
        avg error 0.227905    
        validation
        0.199875013416

推荐答案

训练和验证集在训练过程中使用.

The training and validation sets are used during training.

for each epoch
    for each training data instance
        propagate error through the network
        adjust the weights
        calculate the accuracy over training data
    for each validation data instance
        calculate the accuracy over the validation data
    if the threshold validation accuracy is met
        exit training
    else
        continue training

完成训练后,您可以运行测试集并验证准确性是否足够.

Once you're finished training, then you run against your testing set and verify that the accuracy is sufficient.

训练集:该数据集用于调整神经网络的权重.

Training Set: this data set is used to adjust the weights on the neural network.

验证集:此数据集用于最小化过拟合.你不是用这个数据集调整网络的权重,你只是验证训练数据集的任何准确性增加实际上会导致之前没有向网络显示的数据集的准确性增加,或者至少网络尚未对其进行培训(即验证数据集).如果训练数据集的准确率增加,但验证数据集的准确率保持不变或下降,那么您的神经网络过度拟合,您应该停止训练.

Validation Set: this data set is used to minimize overfitting. You're not adjusting the weights of the network with this data set, you're just verifying that any increase in accuracy over the training data set actually yields an increase in accuracy over a data set that has not been shown to the network before, or at least the network hasn't trained on it (i.e. validation data set). If the accuracy over the training data set increases, but the accuracy over the validation data set stays the same or decreases, then you're overfitting your neural network and you should stop training.

测试集:此数据集仅用于测试最终解决方案,以确认网络的实际预测能力.

Testing Set: this data set is used only for testing the final solution in order to confirm the actual predictive power of the network.

这篇关于神经网络中的训练、验证和测试集之间有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆