R中的群集表示树状图替代 [英] cluster presentation dendrogram alternative in r

查看:165
本文介绍了R中的群集表示树状图替代的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道树状图很流行.但是,如果有大量观察和分类,则很难遵循.但是有时候我觉得应该有更好的方式来表达同样的事情.我有个主意,但不知道如何实施.

I know dendrograms are quite popular. However if there are quite large number of observations and classes it hard to follow. However sometime I feel that there should be better way to present the same thing. I got an idea but do not know how to implement it.

考虑以下树状图.

> data(mtcars)
> plot(hclust(dist(mtcars)))

可以像散点图一样绘制它.其中,两点之间的距离用线绘制,而彩色的聚类(假定阈值)是彩色的,圆的大小由某些变量的值确定.

Can plot it like a scatter plot. In which the distance between two points is plotted with line, while sperate clusters (assumed threshold) are colored and circle size is determined by value of some variable.

推荐答案

您正在描述一种非常典型的进行聚类分析的方法:

You are describing a fairly typical way of going about cluster analysis:

  • 使用聚类算法(在本例中为分层聚类)
  • 确定集群数
  • 使用某种形式或主成分分析在二维平面上投影数据

代码:

hc <- hclust(dist(mtcars))
cluster <- cutree(hc, k=3)
xy <- data.frame(cmdscale(dist(mtcars)), factor(cluster))
names(xy) <- c("x", "y", "cluster")
xy$model <- rownames(xy)

library(ggplot2)
ggplot(xy, aes(x, y)) + geom_point(aes(colour=cluster), size=3)

接下来将发生的事情是,您需要一位熟练的统计学家来帮助解释x和y轴的含义.这通常涉及将数据投影到轴上并提取因子载荷.

What happens next is that you get a skilled statistician to help explain what the x and y axes mean. This usually involves projecting the data to the axes and extracting the factor loadings.

剧情:

这篇关于R中的群集表示树状图替代的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆