R：使用hclust（）进行聚类分析。如何获得集群代表？ [英] R: Cluster analysis with hclust(). How to get the cluster representatives?

查看：216 发布时间：2020/10/3 2:11:14 r cluster-analysis

本文介绍了R：使用hclust（）进行聚类分析。如何获得集群代表？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用 R 进行一些聚类分析。我正在使用 hclust（）函数，在执行聚类分析后，我想获得每个聚类的聚类代表。

I am doing some cluster analysis with R. I am using the hclust() function and I would like to get, after I perform the cluster analysis, the cluster representative of each cluster.

我将一个群集代表定义为最接近群集质心的实例。

I define a cluster representative as the instances which are closest to the centroid of the cluster.

因此，步骤如下：

查找聚类的质心

查找聚类的代表

我已经问过类似的问题，但使用K-means： https://stats.stackexchange.com/questions/251987/cluster-analysis-with-k-means-how-to-get-the-cluster-代表

I have already asked a similar question but using K-means: https://stats.stackexchange.com/questions/251987/cluster-analysis-with-k-means-how-to-get-the-cluster-representatives

在这种情况下，问题在于 hclust 没有给出质心！

The problem, in this case, is that hclust doesn't give the centroids!

例如，说 d 是我的数据，到目前为止，我所做的是：

For example, saying that d are my data, what I have done so far is:

hclust.fit1 <- hclust(d, method="single")     
groups1 <- cutree(hclust.fit1, k=3) # cut tree into 3 clusters

## getting centroids ##

mycentroid <- colMeans(CV)    
clust.centroid = function(i, dat, groups1) {    
  ind = (groups1 == i)   
  colMeans(dat[ind,])
}

centroids <- sapply(unique(groups1), clust.centroid, data, groups1)

但是现在，我正在尝试使用此代码来获取集群代表（我在我问的另一个问题中得到了k均值）：

But now, I was trying to get the cluster representatives with this code (I got it in the other question I asked, for k-means):

index <- c()

for (i in 1:3){    
  rowsum <- rowSums(abs(CV[which(centroids==i),1:3] - centroids[i,]))    
  index[i] <- as.numeric(names(which.min(rowsum)))   
}

它说：

e2中的错误[[j]]：索引超出限制

"Error in e2[[j]] : index out of the limit"

如果有人能给我帮助。谢谢。

I would be grateful if any of you could give me a little help. Thanks.

-（不是）代码的工作示例-

example_data.txt

A,B,C
10.761719,5.452188,7.575762
10.830457,5.158822,7.661588
10.75391,5.500170,7.740330
10.686719,5.286823,7.748297
10.864527,4.883244,7.628730
10.701415,5.345650,7.576218
10.820583,5.151544,7.707404
10.877528,4.786888,7.858234
10.712337,4.744053,7.796390

至于代码：

# Install R packages

#install.packages("fpc")

#install.packages("cluster")

#install.packages("rgl")

library(fpc)
library(cluster)
library(rgl)

CV <- read.csv("example_data")

str(CV)

data <- scale(CV)

d <- dist(data,method = "euclidean")
hclust.fit1 <- hclust(d, method="single") 
groups1 <- cutree(hclust.fit1, k=3) # cut tree into 3 clusters
mycentroid <- colMeans(CV)

clust.centroid = function(i, dat, groups1) {
  ind = (groups1 == i)
  colMeans(dat[ind,])
}

centroids <- sapply(unique(groups1), clust.centroid, CV, groups1)

index <- c()
for (i in 1:3){
  rowsum <- rowSums(abs(CV[which(centroids==i),1:3] - centroids[i,]))
  index[i] <- as.numeric(names(which.min(rowsum)))
}

R：使用hclust（）进行聚类分析。如何获得集群代表？ [英] R: Cluster analysis with hclust(). How to get the cluster representatives?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R：使用hclust（）进行聚类分析。如何获得集群代表？ [英] R: Cluster analysis with hclust(). How to get the cluster representatives?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭