为什么在此kmeans实施中无法获得关于cluser的观点? [英] Why am I not getting points around clusers in this kmeans implementation?

查看:91
本文介绍了为什么在此kmeans实施中无法获得关于cluser的观点?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在下面的kmeans分析中,我指定1或0来指示单词是否与用户相关联:

In below kmeans analysis I am assigning a 1 or 0 to indicate if word is associated with a user :

cells = c(1,1,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,0,1,1,1,1,1,1)
rnames = c("a1","a2","a3","a4","a5","a6","a7","a8","a9")
cnames = c("google","so","test")

x <- matrix(cells, nrow=9, ncol=3, byrow=TRUE, dimnames=list(rnames, cnames))

# run K-Means
km <- kmeans(x, 3, 15)

# print components of km
print(km)

# plot clusters
plot(x, col = km$cluster)
# plot centers
points(km$centers, col = 1:2, pch = 8)

这是图形:

为什么我在每个群集周围都没有获得多个积分?该图表示什么.我想向用户建议一个单词,具体取决于其他用途是否配置了相同的单词.

Why do I not receive multiple points around each cluster ? What is this graph indicating. I would like to suggest a word to a user depending on if another use has the same word configured.

推荐答案

您看不到多个点,因为您的数据是离散的分类观察. K-means实际上仅适用于对连续观察进行分组.您的数据只能显示在显示的图上的三个点上,而三个点不能构成很好的数据云".

You don't see multiple points because your data are discrete, categorical observations. K-means is really only suitable for grouping continuous observations. Your data can only appear on three points on the plot you've shown and three points don't make a nice "cloud" of data.

这向我表明,k均值可能不适合您的特定问题.

This suggests to me that k-means is probably not appropriate for your specific problem.

顺便说一句,当我运行上面的代码时,我得到下面的图,它与您向我们展示的图不同.也许这更像您期望的那样?绿色绿色数据点属于(在周围")黑色星号指示的右上聚类中心.

Incidentally, when I run the code above, I get the plot below, which is different from the one you've shown us. Perhaps this is more like what you are expecting? The green green data point belongs to (is "around") the upper-right cluster centre indicated by a black asterisk.

这篇关于为什么在此kmeans实施中无法获得关于cluser的观点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆