用k-均值聚类着色ggplot的plotmatrix? [英] Colouring ggplot's plotmatrix by k-means clusters?
问题描述
我正在尝试使用ggplot2创建一个包含6个数据变量的对图,并根据它们所属的k-均值聚类对这些点进行着色。我阅读了令人印象深刻的GGally包的文档以及Adam Laiacano [http://adamlaiacano.tumblr.com/post/13501402316/colored-plotmatrix-in-ggplot2]的非正式修复。不幸的是,我找不到任何方法来获得所需的输出。
以下是一个示例代码: -
#瑞士生育率数据集已在这里使用
data_< - read.csv(/ home / tejaskale / Ubuntu \ One / IUCAA / Datasets / swiss.csv,header = TRUE)
data_ < - na.omit(data_)
u< - c(2,3,4,5,6,7)
x< - data _ [,u]
k < - 3
maxIterations< - 100
noOfStarts< - 100
filename< - 'swiss.csv'
library(ggplot2)
库(gridExtra)
库(GGally)
kmeansOutput< - kmeans(x,k,maxIterations,noOfStarts)
xNew < - cbind(x [,1:6],as.factor(kmeansOutput $ cluster))
names(xNew)[7]< - 'cluster'
kmeansPlot< - ggpairs(xNew [,1:6] ,color = xNew $ cluster)
或
kmeansPlot < - plotmatrix(xNew [,1:6],mapping = aes(color = xNew $ cluster))
这两个图都已创建,但未根据群集进行着色。
希望我在论坛上没有错过对这个问题的回答,并且如果确实如此,我们表示歉意。任何帮助将受到高度赞赏。
谢谢!
在对 plotmatrix2
进行细微修改之后,我可以正常工作:
plotmatrix2< - 函数(data,mapping = aes())
{
grid < - expand.grid(x = 1:ncol(data),y = 1:ncol(data))
网格< - 子集(网格,x!= y)
全部< -do.call(rbind,lapply(1:nrow(网格),函数(i){
xcol< - grid [i,x]
ycol< - grid [i,y]
data.frame(xvar = names(data)[ycol],yvar = names(data)[ xcol],
x = data [,xcol],y = data [,ycol],data)
}))
all $ xvar< - factor(all $ xvar,levels = names (数据))
all $ yvar< - factor(所有$ yvar,levels =名称(数据))
密度< - do.call(rbind,lapply(1:ncol ),函数(i){
data.frame(xvar = names(data)[i],yvar = names(data)[i],
x = data [,i])
})
密度$ xvar < - factor(密度$ xvar,levels = names(data))
密度$ yvar < - factor(密度$ yvar,levels = names(data))
mapping< - defaults(mapping,aes_string(x =x,y =y))
class(mapping)< - uneval
ggplot(all)+ facet_grid (xvar_yvar,scales =free)+
geom_point(mapping,na.rm = TRUE)+ stat_density(aes(x = x,
y = ..scaled .. * diff(range( x))+ min(x)),data =密度,
position =identity,color =grey20,geom =line)
}
plotmatrix2(mtcars [,1:3],aes(color = factor(cyl)))
可能是一个 ggplot2 版本问题,但我不得不强制密度
数据框中的分面变量作为因子(即使在 GGally 版本)。此外,通常不要将向量传递给 aes()
,而只需列名。
I am trying to create a pairs plot of 6 data variables using ggplot2 and colour the points according to the k-means cluster they belong to. I read the documentation of the highly impressive 'GGally' package as well as an informal fix by Adam Laiacano [http://adamlaiacano.tumblr.com/post/13501402316/colored-plotmatrix-in-ggplot2]. Unfortunately, I could not find any way to get the desired output in either.
Here is a sample code:-
#The Swiss fertility dataset has been used here
data_ <- read.csv("/home/tejaskale/Ubuntu\ One/IUCAA/Datasets/swiss.csv", header=TRUE)
data_ <- na.omit(data_)
u <- c(2, 3, 4, 5, 6, 7)
x <- data_[,u]
k <- 3
maxIterations <- 100
noOfStarts <- 100
filename <- 'swiss.csv'
library(ggplot2)
library(gridExtra)
library(GGally)
kmeansOutput <- kmeans(x, k, maxIterations, noOfStarts)
xNew <- cbind(x[,1:6], as.factor(kmeansOutput$cluster))
names(xNew)[7] <- 'cluster'
kmeansPlot <- ggpairs(xNew[,1:6], color=xNew$cluster)
OR
kmeansPlot <- plotmatrix(xNew[,1:6], mapping=aes(colour=xNew$cluster))
Both plots are created but aren't coloured according to clusters.
Hope I haven't missed an answer to this question on the forum and apologize if that is indeed the case. Any help would be highly appreciated.
Thanks!
The following slight modification of plotmatrix2
works fine for me:
plotmatrix2 <- function (data, mapping = aes())
{
grid <- expand.grid(x = 1:ncol(data), y = 1:ncol(data))
grid <- subset(grid, x != y)
all <- do.call("rbind", lapply(1:nrow(grid), function(i) {
xcol <- grid[i, "x"]
ycol <- grid[i, "y"]
data.frame(xvar = names(data)[ycol], yvar = names(data)[xcol],
x = data[, xcol], y = data[, ycol], data)
}))
all$xvar <- factor(all$xvar, levels = names(data))
all$yvar <- factor(all$yvar, levels = names(data))
densities <- do.call("rbind", lapply(1:ncol(data), function(i) {
data.frame(xvar = names(data)[i], yvar = names(data)[i],
x = data[, i])
}))
densities$xvar <- factor(densities$xvar, levels = names(data))
densities$yvar <- factor(densities$yvar, levels = names(data))
mapping <- defaults(mapping, aes_string(x = "x", y = "y"))
class(mapping) <- "uneval"
ggplot(all) + facet_grid(xvar ~ yvar, scales = "free") +
geom_point(mapping, na.rm = TRUE) + stat_density(aes(x = x,
y = ..scaled.. * diff(range(x)) + min(x)), data = densities,
position = "identity", colour = "grey20", geom = "line")
}
plotmatrix2(mtcars[,1:3],aes(colour = factor(cyl)))
It may be a ggplot2 version issue, but I had to force the faceting variables in the densities
data frame to be factors (that seems broken to me even in the GGally version). Also, generally don't pass vectors to aes()
, but simply column names.
这篇关于用k-均值聚类着色ggplot的plotmatrix?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!