sklearn:在测试数据集上计算 k-means 的准确度分数 [英] sklearn: calculating accuracy score of k-means on the test data set

查看：35 发布时间：2021/12/25 14:44:04 python scikit-learn k-means

本文介绍了sklearn:在测试数据集上计算 k-means 的准确度分数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在对具有 2 个集群的 30 个样本集进行 k 均值聚类(我已经知道有两个类).我将我的数据分成训练集和测试集，并尝试计算我的测试集的准确度分数.但是有两个问题:首先我不知道我是否真的可以为 k-means 聚类做到这一点(测试集的准确度分数).第二:如果允许我这样做，无论我的实现是对还是错.这是我尝试过的:

I am doing k-means clustering on the set of 30 samples with 2 clusters (I already know there are two classes). I divide my data into training and test set and try to calculate the accuracy score on my test set. But there are two problems: first I don't know if I can actually do this (accuracy score on test set) for k-means clustering. Second: if I am allowed to do this, whether my implementation is right or wrong. Here is what I've tried:

df_hist = pd.read_csv('video_data.csv')

y = df_hist['label'].values
del df_hist['label']
df_hist.to_csv('video_data1.csv')
X = df_hist.values.astype(np.float)

X_train, X_test,y_train,y_test =  cross_validation.train_test_split(X,y,test_size=0.20,random_state=70)
k_means = cluster.KMeans(n_clusters=2)
k_means.fit(X_train)
print(k_means.labels_[:])
print(y_train[:])

score = metrics.accuracy_score(y_test,k_means.predict(X_test))
print('Accuracy:{0:f}'.format(score))

k_means.predict(X_test)
print(k_means.labels_[:])
print(y_test[:])

但是，当我为测试集打印 k-means 标签时 (k_means.predict(X_test) print(k_means.labels_[:])) 和 y_test 标签(print(k_means.labels_[:])) 在最后三行中，我得到的标签与我安装 X-train 时的标签相同，而不是为X 检验.知道我在这里可能做错了什么吗?我正在做的评估 k 均值的性能是否正确?谢谢！

But, when I print k-means labels for the test set (k_means.predict(X_test) print(k_means.labels_[:])) and y_test labels (print(k_means.labels_[:])) in the last three lines, I get the same label as the ones when I was fitting the the X-train, rather than the labels that were produced for the X-test. Any idea what I might be doing wrong here? Is it right at all what I'm doing to evaluate the performance of k-means? Thank you!

推荐答案

在评估准确性方面.您应该记住，k-means 不是分类工具，因此分析准确性不是一个好主意.你可以这样做，但这不是 k-means 的用途.它应该找到一组最大化集群间距离的数据，它不使用您的标签进行训练.因此，通常使用 RandIndex 和其他聚类指标之类的东西来测试 k-means 之类的东西.为了最大限度地提高准确性，您应该适合实际的分类器，例如 kNN、逻辑回归、SVM 等.

In terms of evaluating accuracy. You should remember that k-means is not a classification tool, thus analyzing accuracy is not a very good idea. You can do this, but this is not what k-means is for. It is supposed to find a grouping of data which maximizes between-clusters distances, it does not use your labeling to train. Consequently, things like k-means are usually tested with things like RandIndex and other clustering metrics. For maximization of accuracy you should fit actual classifier, like kNN, logistic regression, SVM, etc.

就代码本身而言，k_means.predict(X_test) 返回标签，它不会更新内部的labels_字段，你应该做

In terms of the code itself, k_means.predict(X_test) returns labeling, it does not update the internal labels_ field, you should do

print(k_means.predict(X_test))

此外，在 python 中，您不必(也不应该)使用 [:] 打印数组，只需执行

Furthermore in python you do not have to (and should not) use [:] to print an array, just do

print(k_means.labels_)
print(y_test)

这篇关于sklearn:在测试数据集上计算 k-means 的准确度分数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

sklearn:在测试数据集上计算 k-means 的准确度分数 [英] sklearn: calculating accuracy score of k-means on the test data set

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

sklearn:在测试数据集上计算 k-means 的准确度分数 [英] sklearn: calculating accuracy score of k-means on the test data set

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭