kNN:培训,测试和验证 [英] kNN: training, testing, and validation

查看:96
本文介绍了kNN:培训,测试和验证的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从10个类中提取图像特征,每个类具有1000张图像.由于我可以提取50个特征,因此我正在考虑找到最佳的特征组合以在此处使用.培训,验证和测试集划分如下:

I am extracting image features from 10 classes with 1000 images each. Since there are 50 features that I can extract, I am thinking of finding the best feature combination to use here. Training, validation and test sets are divided as follows:

Training set = 70%
Validation set = 15%
Test set = 15%

我在验证集上使用正向特征选择来找到最佳特征组合,最后使用测试集检查总体准确性.有人可以告诉我我做对了吗?

I use forward feature selection on the validation set to find the best feature combination and finally use the test set to check the overall accuracy. Could someone please tell me whether I am doing it right?

推荐答案

因此, kNN 是用于构建/测试受监督机器学习模型的通用工作流程的例外.特别是,通过kNN创建的模型只是放置在某些度量空间中的可用标记数据.

So kNN is an exception to general workflow for building/testing supervised machine learning models. In particular, the model created via kNN is just the available labeled data, placed in some metric space.

换句话说,对于kNN,没有训练步骤,因为没有模型.模板匹配和插值就是kNN中的全部工作.

In other words, for kNN, there is no training step because there is no model to build. Template matching & interpolation is all that is going on in kNN.

都没有验证步骤.验证会根据迭代次数(训练进度)来根据训练数据衡量模型的准确性.该经验曲线的向上运动证明了过度拟合,并表明训练应停止的时间点.换句话说,因为没有建立模型,所以没有要验证的东西.

Neither is there a validation step. Validation measures model accuracy against the training data as a function of iteration count (training progress). Overfitting is evidenced by the upward movement of this empirical curve and indicates the point at which training should cease. In other words, because no model is built, there is nothing to validate.

但是您仍然可以进行测试,即使用模型中隐藏了目标(标签或得分)的数据来评估预测的质量.

But you can still test--i.e., assess the quality of the predictions using data in which the targets (labels or scores) are concealed from the model.

但是,与其他监督式机器学习技术相比,kNN的测试甚至有所不同.特别是对于kNN,预测的质量当然取决于数据量,或更确切地说取决于密度(每单位体积的点数),即,如果您要通过平均2-3来预测未知值最接近它的点,那么如果您的点接近您希望预测的点,那么它将很有帮助.因此,将测试集的大小保持较小,或者最好还是使用k倍交叉验证或留一法交叉验证,这两种方法都可以为您提供更全面的模型测试,但不能减少kNN邻居人口的成本.

But even testing is a little different for kNN versus other supervised machine learning techniques. In particular, for kNN, the quality of predictions is of course dependent upon amount of data, or more precisely the density (number of points per unit volume)--i.e., if you are going to predict unkown values by averaging the 2-3 points closest to it, then it helps if you have points close to the one you wish to predict. Therefore, keep the size of the test set small, or better yet use k-fold cross-validation or leave-one-out cross-validation, both of which give you more thorough model testing but not at the cost of reducing the size of your kNN neighbor population.

这篇关于kNN:培训,测试和验证的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆