我有一个错误与k最近的邻居 [英] i have an error with k nearest neighborhood

查看:76
本文介绍了我有一个错误与k最近的邻居的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

i have a k nearest neighborhood c++ code which deal with two text files the first file has 1405 train vectors, the second file has 810 test vectors. knn take these two files and classify each vector in test file using train file finally the code give me an accuracy (i have an error with this accuracy).
when i run the code with: k=1 the accuracy is 89%, k=3 the accuracy is 186%
my question how did the accuracy exceed 100%!

this is the knn code:


int TestKNN (TRAINING_EXAMPLES_LIST *tlist, TRAINING_EXAMPLES_LIST data, 
			 bool isInstanceWeighted, MODE mode,
			 bool isBackwardElimination, bool isAttWKNN)
{
	int correctlyClassifiedInstances = 0;
	TRAINING_EXAMPLES_LIST::iterator testIter;
	TrainingExample tmpTestObj;
	uint index[K];

	for(testIter = data.begin(); testIter != data.end(); ++testIter)
	{
		tmpTestObj = *testIter;
		/* Predict the class for the query point */
		int predictedClass = PredictByKNN(tlist, tmpTestObj.Value, 
											isInstanceWeighted, 
											index, mode, isBackwardElimination, 
											isAttWKNN);
		/* Count the number of correctly classified instances */
		if(((int)(tmpTestObj.Value[NO_OF_ATT-1])) == predictedClass)
			correctlyClassifiedInstances ++;
	}	
	return correctlyClassifiedInstances;
}


int PredictByKNN (TRAINING_EXAMPLES_LIST *tlist, double *query, 
				  bool isWeightedKNN, uint *index, MODE mode, 
				  bool isBE, bool isAttWeightedKNN)
{
	double distance = 0.0;
	TRAINING_EXAMPLES_LIST::iterator iter;
	TrainingExample tmpObj;
	TRAINING_EXAMPLES_LIST elistWithD;

	if(!elistWithD.empty())
		elistWithD.clear ();

	/* If we are in for backward elimination or attribute WKNN */
	/* then Instance WKNN has to be false                      */
	if(isBE || isAttWeightedKNN)
		isWeightedKNN = false;

	/* Calculate the distance of the query */
	/* point from all training instances   */
	/* using the euclidean distance        */
	for(iter = tlist->begin(); iter != tlist->end(); ++iter)
	{
		tmpObj = *iter;
		distance = 0.0;

		for(int j = 0; j < NO_OF_ATT - 1; j++)
		{
			
			
				distance += (abs(query[j] - tmpObj.Value[j]) * 
							abs(query[j] - tmpObj.Value[j])) * 
								(attWeights[j] * attWeights[j]);
			
			
		}
		distance = sqrt(distance);
		/* If the distance is zero then set it to some high value */
		/* since it the query point itself                        */
		if((int)(distance*1000) == 0)
			distance = 999999999999999.9;
		
		tmpObj.Distance = distance; 
		elistWithD.insert (elistWithD.end(), tmpObj);
	}

	/* Sort the points on distance in ascending order */
	elistWithD.sort(compare);

	
	
		/* Simple KNN, Attribute Weighted KNN, Backward Elimination */
		int classCount[NO_OF_CLASSES];

		for(int i = 0; i < NO_OF_CLASSES; i++)
			classCount[i] = 0;

		int knn = K;
		for(iter = elistWithD.begin(); iter != elistWithD.end(); ++iter)
		{
			/* Calculate how the K nearest neighbors are classified */
			tmpObj = *iter;
			classCount[(int)tmpObj.Value[NO_OF_ATT-1]]++;
			knn--;
			if(!knn)
				break;
		}

		int maxClass = 0;
		int maxCount = 0;

		/* Find the class represented maximum number of times */
		/* among the k neighbors                              */
		for(int i = 0; i < NO_OF_CLASSES; i++)
		{
			if(classCount[i] > maxCount)
			{
				maxClass = i;
				maxCount = classCount[i];
			}
		}

		return maxClass;
	}

推荐答案

请使用调试器或日志记录,找到计算准确度的地方(假设这是您的代码; - )),并在精度超过100%时检测情况。您可以简单地修改代码以添加检查并生成一些日志或错误消息。这样,您就可以找到重现问题的确切步骤。 (最有可能的是,您已经知道了这些步骤。)然后在代码点放置一个断点,只有在精度超过100%时才会执行。在这种情况下,只有在满足问题复制条件时,调试器才会停止执行。此时,查看Debug窗口Call stack;它将为您提供执行来源的确切信息。它将向您显示进一步调试的位置,并在问题的根部附近查看问题。因此,您将使用少量步骤找到确切原因。


它可能看起来比仅仅盯着您的代码示例更有效。更好的是,它会帮助你继续前进,而不会在下次再问另一个问题。技能比决议更重要。



-SA
Please use the debugger or logging, locate the place where you calculate accuracy (assuming this is your code ;-)), and detect the case when the accuracy exceeds 100%. You can simply to modify the code to add the check and generate some log or error message. This way, you find out exact steps to reproduce the problem. (Most likely, you already know those steps.) Then put a breakpoint at the point of the code which will be executed only if the accuracy exceeds 100%. In this case, the debugger will stop the execution only at this point when the condition for problem reproductions are met. At this point, look at the Debug window "Call stack"; it will give you exact information where the execution came from. It will show you where to do further debugging and see the problem closer to the root of the problem. So, you will locate exact reason using small number of steps.

It may appear more efficient than just staring at your code sample. Even better, it will help you to go along without asking another question next time. The skill is more important than the resolution.

—SA


这篇关于我有一个错误与k最近的邻居的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆