聚类标签和聚类中心(R中的kmeans) [英] cluster labels and cluster centers (kmeans in R)
问题描述
我是R的新手,正在尝试处理kmeans对象.理想情况下,我想做的是获取数据中每个点的聚类标签列表,并将标签替换为相应的中心.本质上,以一个矩阵结尾,在该矩阵中,每个数据点都由kmeans放置到的群集中心的值来表示.
I am extemely new to R and trying to deal with a kmeans object. Ideally what I would like to do is to take the list of cluster labels for each point in my data and replace the label with the corresponding center. Essentially, ending up with a matrix where each data point is represented by the value of the center of the cluster it has been placed into by kmeans.
有没有一种方法可以有效地做到这一点,而不是手动检查每个条目并将集群标签替换为集群中心值?
Is there a way to do this efficiently instead of going through each entry manually and replacing the cluster label with the cluster center value?
谢谢!
推荐答案
这是您要的吗?扩展自此答案:
Is this what you're after? Extended from this answer:
# make some data
x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
colnames(x) <- c("x", "y")
#
# do cluster analysis
(cl <- kmeans(x, 2))
#
# put cluster labels with data
out1 <- data.frame(cbind(x, clusterNum = cl$cluster))
#
# organise center coords to be ready for merging
centers <- data.frame(cbind(data.frame(cl$center[,1]),
data.frame(cl$center[,2]),
clusterNum=rownames(cl$center)))
#
# merge cluster center coords with data
out2 <- merge(out1, centers, all.x = TRUE)
#
# check output
out2
clusterNum x y cl.center...1. cl.center...2.
1 1 0.233161364 -0.04258146 0.01064895 0.01376516
2 1 -0.356284774 -0.59135602 0.01064895 0.01376516
3 1 -0.302272796 -0.24033113 0.01064895 0.01376516
4 1 -0.369299302 -0.24997660 0.01064895 0.01376516
5 1 -0.060454427 0.19711328 0.01064895 0.01376516
...
90 2 0.609833599 0.67729922 1.05184887 1.03445718
91 2 0.943306637 1.09420588 1.05184887 1.03445718
92 2 0.545053826 1.22620571 1.05184887 1.03445718
93 2 0.706921965 1.10326091 1.05184887 1.03445718
94 2 0.837644227 1.07121784 1.05184887 1.03445718
95 2 0.550863085 1.06977250 1.05184887 1.03445718
#
# Success! We have one dataframe that includes: raw data, cluster labels
# and cluster center coords
我使用merge
将群集中心与原始数据进行协调,但是毫无疑问,还有更有效的方法(例如,不需要重组cl$center
).
I used merge
to put the cluster center coords with the raw data, but no doubt there are more efficient ways (for example, that don't require cl$center
to be reorganised).
这篇关于聚类标签和聚类中心(R中的kmeans)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!