如何创建“Clustergram”情节? (在R中) [英] How to create a "Clustergram" plot ? (in R)
问题描述
我遇到了这个有趣的网站,提供了一种可视化群集的方法算法称为Clustergram:
alt文本http://www.schonlau.net/images/clustergramexample.gif
我不确定这是多么有用,但为了玩有了它,我想用R来重现它,但我不知道该怎么去做。
你会如何为每个物品创建一条线,以便它可以停留在整个不同数量的群集中保持一致?
以下是一个示例代码/数据,可用于回答:
hc < - hclust(dist(USArrests),ave)
plot(hc)
更新:我发布了一个冗长的示例和讨论解决方案 here 。 (它基于我给出的代码)。此外,Hadley非常友善,并提供了代码的ggplot2实现。
这是一个基本的解决方案(对于更好的解决方案,请看上面的更新):
set.seed(100)
Data <-rbind(matrix(rnorm(100,sd = 0.3) (数据)<-c(x,y,ncol = 2),
矩阵(rnorm(100,mean = 1,sd = 0.3),ncol = 2))
colnames )
#noise < - runif(100,0,.05)
line.width< - rep(.004,dim(Data)[1])$ b $ b Y < - NULL
X < - NULL
k.range < - 2:10
plot(0,0,col =white,xlim = c (1,10),ylim = c( - 。5,1.6),
xlab =簇的数量,ylab =簇的意思是,
main =(基本的)Clustergram)
轴(side = 1,at = k.range)
abline(v = k.range,col =gray)
centers.points< - list() (k,k.range)中的
{
cl< - kmeans(Data,k)
clusters.vec< - cl $ cluster
the.centers< - apply(cl $ centers,1,mean)
noise&l t; - unlist(tapply(line.width,clusters.vec,
cumsum))[order(seq_along(clusters.vec)[order(clusters.vec)])]
noise < - noise - 平均(范围(噪声))
y < - the.centers [clusters.vec] +噪声
Y < - cbind(Y,y)
x < - rep(k,长度(y))
X < - cbind(X,x)
centers.points [[k]] < - data.frame(y = the.centers,x = rep(k,k))
#points(the.centers〜rep(k,k),pch = 19,col =red,cex = 1.5)
}
require(colorpace)
COL < - rainbow_hcl(100)
matlines(t(X),t(Y),pch = 19,col = COL,lty = 1,lwd = 1.5)
#add points
lapply(centers.points,
function(xx){with(xx,points(y〜x,pch = 19,col =red, cex = 1.3))})
I came across this interesting website, with an idea of a way to visualize a clustering algorithm called "Clustergram":
alt text http://www.schonlau.net/images/clustergramexample.gif
I am not sure how useful this really is, but in order to play with it I would like to reproduce it with R, but am not sure how to go about doing it.
How would you create a line for each item so it would stay consistent throughout the different number of clusters?
Here is an example code/data to play with for potential answer:
hc <- hclust(dist(USArrests), "ave")
plot(hc)
Update: I posted a solution with a lengthy example and discussion here. (it is based on the code I gave bellow). Also, Hadley was very kind and offered a ggplot2 implementation of the code.
Here is a basic solution (for a better one, look at the "update" above):
set.seed(100)
Data <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
colnames(Data) <- c("x", "y")
# noise <- runif(100,0,.05)
line.width <- rep(.004, dim(Data)[1])
Y <- NULL
X <- NULL
k.range <- 2:10
plot(0, 0, col = "white", xlim = c(1,10), ylim = c(-.5,1.6),
xlab = "Number of clusters", ylab = "Clusters means",
main = "(Basic) Clustergram")
axis(side =1, at = k.range)
abline(v = k.range, col = "grey")
centers.points <- list()
for(k in k.range){
cl <- kmeans(Data, k)
clusters.vec <- cl$cluster
the.centers <- apply(cl$centers,1, mean)
noise <- unlist(tapply(line.width, clusters.vec,
cumsum))[order(seq_along(clusters.vec)[order(clusters.vec)])]
noise <- noise - mean(range(noise))
y <- the.centers[clusters.vec] + noise
Y <- cbind(Y, y)
x <- rep(k, length(y))
X <- cbind(X, x)
centers.points[[k]] <- data.frame(y = the.centers , x = rep(k , k))
# points(the.centers ~ rep(k , k), pch = 19, col = "red", cex = 1.5)
}
require(colorspace)
COL <- rainbow_hcl(100)
matlines(t(X), t(Y), pch = 19, col = COL, lty = 1, lwd = 1.5)
# add points
lapply(centers.points,
function(xx){ with(xx,points(y~x, pch = 19, col = "red", cex = 1.3)) })
这篇关于如何创建“Clustergram”情节? (在R中)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!