KNN归一化的精度差异 [英] Accuracy difference on normalization in KNN

查看:269
本文介绍了KNN归一化的精度差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在KNN分类算法上训练了我的模型,并且我的准确率达到了97%左右.但是,我后来发现我错过了对数据进行归一化的工作,并且对数据进行了归一化并重新训练了模型,现在我的准确率仅为87%.可能是什么原因?并且我应该坚持使用未规范化的数据,还是应该切换到规范化的版本.

I had trained my model on KNN classification algorithm , and I was getting around 97% accuracy. However,I later noticed that I had missed out to normalise my data and I normalised my data and retrained my model, now I am getting an accuracy of only 87%. What could be the reason? And should I stick to using data that is not normalised or should I switch to normalized version.

推荐答案

要回答您的问题,您首先需要了解KNN的工作原理.这是一个简单的图:

To answer your question, you first need to understand how KNN works. Here is a simple diagram:

假设?是您要尝试分类为红色或蓝色的点.对于这种情况,假设您尚未规范化任何数据.正如您可以清楚地看到?比蓝色机器人更接近红色点.因此,该点将被假定为红色.我们还要假设正确的标签是红色,因此这是正确的匹配项!

Supposed the ? is the point you are trying to classify into either red or blue. For this case lets assume you haven't normalized any of the data. As you can see clearly the ? is closer to more red dots than blue bots. Therefore, this point would be assumed to be red. Lets also assume the correct label is red, therefore this is a correct match!

现在,讨论标准化.规范化是一种获取略有不同但又赋予其通用状态的数据的方式(在您的情况下,应将其视为使功能更加相似).在上面的示例中,假设您对?的特征进行了归一化,因此输出y值变小.这会将问号放在其当前位置下方,并被更多的蓝点包围.因此,您的算法会将其标记为蓝色,这将是不正确的.哎哟!

Now, to discuss normalization. Normalization is a way of taking data that is slightly dissimilar but giving it a common state (in your case think of it as making the features more similar). Assume in the above example that you normalize the ?'s features, and therefore the output y value becomes less. This would place the question mark below it's current position and surrounded by more blue dots. Therefore, your algo would label it as blue, and it would be incorrect. Ouch!

现在回答您的问题.抱歉,没有答案!有时,对数据进行标准化会消除重要的特征差异,从而导致准确性下降.有时,它有助于消除功能中引起分类错误的噪声.另外,仅因为当前使用的数据集的准确性提高,并不意味着您使用不同的数据集将获得相同的结果.

Now to answer your questions. Sorry, but there is no answer! Sometimes normalizing data removes important feature differences therefore causing accuracy to go down. Other times, it helps to eliminate noise in your features which cause incorrect classifications. Also, just because accuracy goes up for the data set your are currently working with, doesn't mean you will get the same results with a different data set.

长话短说,不要尝试将归一化标签为好/不好,而是要考虑使用用于分类的要素输入,确定哪些要素对模型很重要,并确保这些要素之间的差异能够正确反映在您的模型中分类模型.祝你好运!

Long story short, instead of trying to label normalization as good/bad, instead consider the feature inputs you are using for classification, determine which ones are important to your model, and make sure differences in those features are reflected accurately in your classification model. Best of luck!

这篇关于KNN归一化的精度差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆