如何绘制群集的群集内平方和的图? [英] How to draw the plot of within-cluster sum-of-squares for a cluster?
问题描述
我有一个R的聚类图,而我想用wss图来优化聚类的肘标准",但是我不知道如何为给定聚类绘制wss图,有人会帮助我吗? >
这是我的数据:
Friendly<-c(0.467,0.175,0.004,0.025,0.083,0.004,0.042,0.038,0,0.008,0.008,0.05,0.096)
Polite<-c(0.117,0.55,0,0,0.054,0.017,0.017,0.017,0,0.017,0.008,0.104,0.1)
Praising<-c(0.079,0.046,0.563,0.029,0.092,0.025,0.004,0.004,0.129,0,0,0,0.029)
Joking<-c(0.125,0.017,0.054,0.383,0.108,0.054,0.013,0.008,0.092,0.013,0.05,0.017,0.067)
Sincere<-c(0.092,0.088,0.025,0.008,0.383,0.133,0.017,0.004,0,0.063,0,0,0.188)
Serious<-c(0.033,0.021,0.054,0.013,0.2,0.358,0.017,0.004,0.025,0.004,0.142,0.021,0.108)
Hostile<-c(0.029,0.004,0,0,0.013,0.033,0.371,0.363,0.075,0.038,0.025,0.004,0.046)
Rude<-c(0,0.008,0,0.008,0.017,0.075,0.325,0.313,0.004,0.092,0.063,0.008,0.088)
Blaming<-c(0.013,0,0.088,0.038,0.046,0.046,0.029,0.038,0.646,0.029,0.004,0,0.025)
Insincere<-c(0.075,0.063,0,0.013,0.096,0.017,0.021,0,0.008,0.604,0.004,0,0.1)
Commanding<-c(0,0,0,0,0,0.233,0.046,0.029,0.004,0.004,0.538,0,0.146)
Suggesting<-c(0.038,0.15,0,0,0.083,0.058,0,0,0,0.017,0.079,0.133,0.442)
Neutral<-c(0.021,0.075,0.017,0,0.033,0.042,0.017,0,0.033,0.017,0.021,0.008,0.717)
data <- data.frame(Friendly,Polite,Praising,Joking,Sincere,Serious,Hostile,Rude,Blaming,Insincere,Commanding,Suggesting,Neutral)
这是我的集群代码:
cor <- cor (data)
dist<-dist(cor)
hclust<-hclust(dist)
plot(hclust)
运行上面的代码后,我将得到树状图,而我该如何绘制这样的图:
如果我遵循您的要求,那么我们需要一个函数来计算WSS
wss <- function(d) {
sum(scale(d, scale = FALSE)^2)
}
和此wss()
函数的包装器
wrap <- function(i, hc, x) {
cl <- cutree(hc, i)
spl <- split(x, cl)
wss <- sum(sapply(spl, wss))
wss
}
此包装器接受以下参数,输入:
-
i
将数据切入的簇数 -
hc
层次聚类分析对象 -
x
原始数据
wrap
将树状图切割为i
簇,将原始数据拆分为cl
给定的簇成员资格,并为每个簇计算WSS.将这些WSS值相加即可得出该聚类的WSS.
我们使用sapply
在集群1,2,...,nrow(data)
res <- sapply(seq.int(1, nrow(data)), wrap, h = cl, x = data)
可以使用以下方式绘制场景图
plot(seq_along(res), res, type = "b", pch = 19)
以下是使用著名的埃德加·安德森·艾里斯(Edgar Anderson Iris)数据集的示例:
iris2 <- iris[, 1:4] # drop Species column
cl <- hclust(dist(iris2), method = "ward.D")
## Takes a little while as we evaluate all implied clustering up to 150 groups
res <- sapply(seq.int(1, nrow(iris2)), wrap, h = cl, x = iris2)
plot(seq_along(res), res, type = "b", pch = 19)
这给出了:
我们可以通过仅显示第一个1:50群集来放大
plot(seq_along(res[1:50]), res[1:50], type = "o", pch = 19)
给出
您可以通过适当的并行化替代方案运行sapply()
来加快主要计算步骤,或者仅对少于nrow(data)
个群集进行计算,例如
res <- sapply(seq.int(1, 50), wrap, h = cl, x = iris2) ## 1st 50 groups
I have a cluster plot by R while I want to optimize the "elbow criterion" of clustering with a wss plot, but I do not know how to draw a wss plot for a giving cluster, anyone would help me?
Here is my data:
Friendly<-c(0.467,0.175,0.004,0.025,0.083,0.004,0.042,0.038,0,0.008,0.008,0.05,0.096)
Polite<-c(0.117,0.55,0,0,0.054,0.017,0.017,0.017,0,0.017,0.008,0.104,0.1)
Praising<-c(0.079,0.046,0.563,0.029,0.092,0.025,0.004,0.004,0.129,0,0,0,0.029)
Joking<-c(0.125,0.017,0.054,0.383,0.108,0.054,0.013,0.008,0.092,0.013,0.05,0.017,0.067)
Sincere<-c(0.092,0.088,0.025,0.008,0.383,0.133,0.017,0.004,0,0.063,0,0,0.188)
Serious<-c(0.033,0.021,0.054,0.013,0.2,0.358,0.017,0.004,0.025,0.004,0.142,0.021,0.108)
Hostile<-c(0.029,0.004,0,0,0.013,0.033,0.371,0.363,0.075,0.038,0.025,0.004,0.046)
Rude<-c(0,0.008,0,0.008,0.017,0.075,0.325,0.313,0.004,0.092,0.063,0.008,0.088)
Blaming<-c(0.013,0,0.088,0.038,0.046,0.046,0.029,0.038,0.646,0.029,0.004,0,0.025)
Insincere<-c(0.075,0.063,0,0.013,0.096,0.017,0.021,0,0.008,0.604,0.004,0,0.1)
Commanding<-c(0,0,0,0,0,0.233,0.046,0.029,0.004,0.004,0.538,0,0.146)
Suggesting<-c(0.038,0.15,0,0,0.083,0.058,0,0,0,0.017,0.079,0.133,0.442)
Neutral<-c(0.021,0.075,0.017,0,0.033,0.042,0.017,0,0.033,0.017,0.021,0.008,0.717)
data <- data.frame(Friendly,Polite,Praising,Joking,Sincere,Serious,Hostile,Rude,Blaming,Insincere,Commanding,Suggesting,Neutral)
And here is my code of clustering:
cor <- cor (data)
dist<-dist(cor)
hclust<-hclust(dist)
plot(hclust)
And I will get a dendrogram after running the code above, while how can I draw a plot like this:
If I follow what you want, then we need a function to compute WSS
wss <- function(d) {
sum(scale(d, scale = FALSE)^2)
}
and a wrapper for this wss()
function
wrap <- function(i, hc, x) {
cl <- cutree(hc, i)
spl <- split(x, cl)
wss <- sum(sapply(spl, wss))
wss
}
This wrapper takes the following arguments, inputs:
i
the number of clusters to cut the data intohc
the hierarchical cluster analysis objectx
the original data
wrap
then cuts the dendrogram into i
clusters, splits the original data into the cluster membership given by cl
and computes the WSS for each cluster. These WSS values are summed to give the WSS for that clustering.
We run all of this using sapply
over the number of clusters 1, 2, ..., nrow(data)
res <- sapply(seq.int(1, nrow(data)), wrap, h = cl, x = data)
A screeplot can be drawn using
plot(seq_along(res), res, type = "b", pch = 19)
Here is an example using the well-known Edgar Anderson Iris data set:
iris2 <- iris[, 1:4] # drop Species column
cl <- hclust(dist(iris2), method = "ward.D")
## Takes a little while as we evaluate all implied clustering up to 150 groups
res <- sapply(seq.int(1, nrow(iris2)), wrap, h = cl, x = iris2)
plot(seq_along(res), res, type = "b", pch = 19)
This gives:
We can zoom in by just showing the first 1:50 clusters
plot(seq_along(res[1:50]), res[1:50], type = "o", pch = 19)
which gives
You can speed up the main computation step by either running the sapply()
via an appropriate parallelised alternative, or just do the computation for a fewer than nrow(data)
clusters, e.g.
res <- sapply(seq.int(1, 50), wrap, h = cl, x = iris2) ## 1st 50 groups
这篇关于如何绘制群集的群集内平方和的图?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!