如何绘制群集的群集内平方和的图? [英] How to draw the plot of within-cluster sum-of-squares for a cluster?

查看:129
本文介绍了如何绘制群集的群集内平方和的图?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个R的聚类图,而我想用wss图来优化聚类的肘标准",但是我不知道如何为给定聚类绘制wss图,有人会帮助我吗? >

这是我的数据:

Friendly<-c(0.467,0.175,0.004,0.025,0.083,0.004,0.042,0.038,0,0.008,0.008,0.05,0.096)
Polite<-c(0.117,0.55,0,0,0.054,0.017,0.017,0.017,0,0.017,0.008,0.104,0.1)
Praising<-c(0.079,0.046,0.563,0.029,0.092,0.025,0.004,0.004,0.129,0,0,0,0.029)
Joking<-c(0.125,0.017,0.054,0.383,0.108,0.054,0.013,0.008,0.092,0.013,0.05,0.017,0.067)
Sincere<-c(0.092,0.088,0.025,0.008,0.383,0.133,0.017,0.004,0,0.063,0,0,0.188)
Serious<-c(0.033,0.021,0.054,0.013,0.2,0.358,0.017,0.004,0.025,0.004,0.142,0.021,0.108)
Hostile<-c(0.029,0.004,0,0,0.013,0.033,0.371,0.363,0.075,0.038,0.025,0.004,0.046)
Rude<-c(0,0.008,0,0.008,0.017,0.075,0.325,0.313,0.004,0.092,0.063,0.008,0.088)
Blaming<-c(0.013,0,0.088,0.038,0.046,0.046,0.029,0.038,0.646,0.029,0.004,0,0.025)
Insincere<-c(0.075,0.063,0,0.013,0.096,0.017,0.021,0,0.008,0.604,0.004,0,0.1)
Commanding<-c(0,0,0,0,0,0.233,0.046,0.029,0.004,0.004,0.538,0,0.146)
Suggesting<-c(0.038,0.15,0,0,0.083,0.058,0,0,0,0.017,0.079,0.133,0.442)
Neutral<-c(0.021,0.075,0.017,0,0.033,0.042,0.017,0,0.033,0.017,0.021,0.008,0.717)

data <- data.frame(Friendly,Polite,Praising,Joking,Sincere,Serious,Hostile,Rude,Blaming,Insincere,Commanding,Suggesting,Neutral)

这是我的集群代码:

cor <- cor (data)
dist<-dist(cor)
hclust<-hclust(dist)
plot(hclust)

运行上面的代码后,我将得到树状图,而我该如何绘制这样的图:

解决方案

如果我遵循您的要求,那么我们需要一个函数来计算WSS

wss <- function(d) {
  sum(scale(d, scale = FALSE)^2)
}

和此wss()函数的包装器

wrap <- function(i, hc, x) {
  cl <- cutree(hc, i)
  spl <- split(x, cl)
  wss <- sum(sapply(spl, wss))
  wss
}

此包装器接受以下参数,输入:

  • i将数据切入的簇数
  • hc层次聚类分析对象
  • x原始数据
然后,

wrap将树状图切割为i簇,将原始数据拆分为cl给定的簇成员资格,并为每个簇计算WSS.将这些WSS值相加即可得出该聚类的WSS.

我们使用sapply在集群1,2,...,nrow(data)

上运行所有这些操作

res <- sapply(seq.int(1, nrow(data)), wrap, h = cl, x = data)

可以使用以下方式绘制场景图

plot(seq_along(res), res, type = "b", pch = 19)

以下是使用著名的埃德加·安德森·艾里斯(Edgar Anderson Iris)数据集的示例:

iris2 <- iris[, 1:4]  # drop Species column
cl <- hclust(dist(iris2), method = "ward.D")

## Takes a little while as we evaluate all implied clustering up to 150 groups
res <- sapply(seq.int(1, nrow(iris2)), wrap, h = cl, x = iris2)
plot(seq_along(res), res, type = "b", pch = 19)

这给出了:

我们可以通过仅显示第一个1:50群集来放大

plot(seq_along(res[1:50]), res[1:50], type = "o", pch = 19)

给出

您可以通过适当的并行化替代方案运行sapply()来加快主要计算步骤,或者仅对少于nrow(data)个群集进行计算,例如

res <- sapply(seq.int(1, 50), wrap, h = cl, x = iris2) ## 1st 50 groups

I have a cluster plot by R while I want to optimize the "elbow criterion" of clustering with a wss plot, but I do not know how to draw a wss plot for a giving cluster, anyone would help me?

Here is my data:

Friendly<-c(0.467,0.175,0.004,0.025,0.083,0.004,0.042,0.038,0,0.008,0.008,0.05,0.096)
Polite<-c(0.117,0.55,0,0,0.054,0.017,0.017,0.017,0,0.017,0.008,0.104,0.1)
Praising<-c(0.079,0.046,0.563,0.029,0.092,0.025,0.004,0.004,0.129,0,0,0,0.029)
Joking<-c(0.125,0.017,0.054,0.383,0.108,0.054,0.013,0.008,0.092,0.013,0.05,0.017,0.067)
Sincere<-c(0.092,0.088,0.025,0.008,0.383,0.133,0.017,0.004,0,0.063,0,0,0.188)
Serious<-c(0.033,0.021,0.054,0.013,0.2,0.358,0.017,0.004,0.025,0.004,0.142,0.021,0.108)
Hostile<-c(0.029,0.004,0,0,0.013,0.033,0.371,0.363,0.075,0.038,0.025,0.004,0.046)
Rude<-c(0,0.008,0,0.008,0.017,0.075,0.325,0.313,0.004,0.092,0.063,0.008,0.088)
Blaming<-c(0.013,0,0.088,0.038,0.046,0.046,0.029,0.038,0.646,0.029,0.004,0,0.025)
Insincere<-c(0.075,0.063,0,0.013,0.096,0.017,0.021,0,0.008,0.604,0.004,0,0.1)
Commanding<-c(0,0,0,0,0,0.233,0.046,0.029,0.004,0.004,0.538,0,0.146)
Suggesting<-c(0.038,0.15,0,0,0.083,0.058,0,0,0,0.017,0.079,0.133,0.442)
Neutral<-c(0.021,0.075,0.017,0,0.033,0.042,0.017,0,0.033,0.017,0.021,0.008,0.717)

data <- data.frame(Friendly,Polite,Praising,Joking,Sincere,Serious,Hostile,Rude,Blaming,Insincere,Commanding,Suggesting,Neutral)

And here is my code of clustering:

cor <- cor (data)
dist<-dist(cor)
hclust<-hclust(dist)
plot(hclust)

And I will get a dendrogram after running the code above, while how can I draw a plot like this:

解决方案

If I follow what you want, then we need a function to compute WSS

wss <- function(d) {
  sum(scale(d, scale = FALSE)^2)
}

and a wrapper for this wss() function

wrap <- function(i, hc, x) {
  cl <- cutree(hc, i)
  spl <- split(x, cl)
  wss <- sum(sapply(spl, wss))
  wss
}

This wrapper takes the following arguments, inputs:

  • i the number of clusters to cut the data into
  • hc the hierarchical cluster analysis object
  • x the original data

wrap then cuts the dendrogram into i clusters, splits the original data into the cluster membership given by cl and computes the WSS for each cluster. These WSS values are summed to give the WSS for that clustering.

We run all of this using sapply over the number of clusters 1, 2, ..., nrow(data)

res <- sapply(seq.int(1, nrow(data)), wrap, h = cl, x = data)

A screeplot can be drawn using

plot(seq_along(res), res, type = "b", pch = 19)

Here is an example using the well-known Edgar Anderson Iris data set:

iris2 <- iris[, 1:4]  # drop Species column
cl <- hclust(dist(iris2), method = "ward.D")

## Takes a little while as we evaluate all implied clustering up to 150 groups
res <- sapply(seq.int(1, nrow(iris2)), wrap, h = cl, x = iris2)
plot(seq_along(res), res, type = "b", pch = 19)

This gives:

We can zoom in by just showing the first 1:50 clusters

plot(seq_along(res[1:50]), res[1:50], type = "o", pch = 19)

which gives

You can speed up the main computation step by either running the sapply() via an appropriate parallelised alternative, or just do the computation for a fewer than nrow(data) clusters, e.g.

res <- sapply(seq.int(1, 50), wrap, h = cl, x = iris2) ## 1st 50 groups

这篇关于如何绘制群集的群集内平方和的图?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆