用k-均值聚类着色ggplot的plotmatrix? [英] Colouring ggplot's plotmatrix by k-means clusters?

查看:257
本文介绍了用k-均值聚类着色ggplot的plotmatrix?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用ggplot2创建一个包含6个数据变量的对图,并根据它们所属的k-均值聚类对这些点进行着色。我阅读了令人印象深刻的GGally包的文档以及Adam Laiacano [http://adamlaiacano.tumblr.com/post/13501402316/colored-plotmatrix-in-ggplot2]的非正式修复。不幸的是,我找不到任何方法来获得所需的输出。



以下是一个示例代码: -

 #瑞士生育率数据集已在这里使用

data_< - read.csv(/ home / tejaskale / Ubuntu \ One / IUCAA / Datasets / swiss.csv,header = TRUE)
data_ < - na.omit(data_)

u< - c(2,3,4,5,6,7)
x< - data _ [,u]
k < - 3
maxIterations< - 100
noOfStarts< - 100
filename< - 'swiss.csv'

library(ggplot2)
库(gridExtra)
库(GGally)

kmeansOutput< - kmeans(x,k,maxIterations,noOfStarts)

xNew < - cbind(x [,1:6],as.factor(kmeansOutput $ cluster))
names(xNew)[7]< - 'cluster'
kmeansPlot< - ggpairs(xNew [,1:6] ,color = xNew $ cluster)



kmeansPlot < - plotmatrix(xNew [,1:6],mapping = aes(color = xNew $ cluster))

这两个图都已创建,但未根据群集进行着色。

希望我在论坛上没有错过对这个问题的回答,并且如果确实如此,我们表示歉意。任何帮助将受到高度赞赏。



谢谢!

解决方案

在对 plotmatrix2 进行细微修改之后,我可以正常工作:

  plotmatrix2< - 函数(data,mapping = aes())
{
grid < - expand.grid(x = 1:ncol(data),y = 1:ncol(data))
网格< - 子集(网格,x!= y)
全部< -do.call(rbind,lapply(1:nrow(网格),函数(i){
xcol< - grid [i,x]
ycol< - grid [i,y]
data.frame(xvar = names(data)[ycol],yvar = names(data)[ xcol],
x = data [,xcol],y = data [,ycol],data)
}))
all $ xvar< - factor(all $ xvar,levels = names (数据))
all $ yvar< - factor(所有$ yvar,levels =名称(数据))
密度< - do.call(rbind,lapply(1:ncol ),函数(i){
data.frame(xvar = names(data)[i],yvar = names(data)[i],
x = data [,i])
})
密度$ xvar < - factor(密度$ xvar,levels = names(data))
密度$ yvar < - factor(密度$ yvar,levels = names(data))
mapping< - defaults(mapping,aes_string(x =x,y =y))
class(mapping)< - uneval
ggplot(all)+ facet_grid (xvar_yvar,scales =free)+
geom_point(mapping,na.rm = TRUE)+ stat_density(aes(x = x,
y = ..scaled .. * diff(range( x))+ min(x)),data =密度,
position =identity,color =grey20,geom =line)
}


plotmatrix2(mtcars [,1:3],aes(color = factor(cyl)))



可能是一个 ggplot2 版本问题,但我不得不强制密度数据框中的分面变量作为因子(即使在 GGally 版本)。此外,通常不要将向量传递给 aes(),而只需列名。


I am trying to create a pairs plot of 6 data variables using ggplot2 and colour the points according to the k-means cluster they belong to. I read the documentation of the highly impressive 'GGally' package as well as an informal fix by Adam Laiacano [http://adamlaiacano.tumblr.com/post/13501402316/colored-plotmatrix-in-ggplot2]. Unfortunately, I could not find any way to get the desired output in either.

Here is a sample code:-

#The Swiss fertility dataset has been used here

data_ <- read.csv("/home/tejaskale/Ubuntu\ One/IUCAA/Datasets/swiss.csv", header=TRUE)
data_ <- na.omit(data_)

u <- c(2, 3, 4, 5, 6, 7)
x <- data_[,u]
k <- 3
maxIterations <- 100
noOfStarts <- 100
filename <- 'swiss.csv'

library(ggplot2)
library(gridExtra)
library(GGally)

kmeansOutput <- kmeans(x, k, maxIterations, noOfStarts)

xNew <- cbind(x[,1:6], as.factor(kmeansOutput$cluster))
names(xNew)[7] <- 'cluster'
kmeansPlot <- ggpairs(xNew[,1:6], color=xNew$cluster)

OR

kmeansPlot <- plotmatrix(xNew[,1:6], mapping=aes(colour=xNew$cluster))

Both plots are created but aren't coloured according to clusters.

Hope I haven't missed an answer to this question on the forum and apologize if that is indeed the case. Any help would be highly appreciated.

Thanks!

解决方案

The following slight modification of plotmatrix2 works fine for me:

plotmatrix2 <- function (data, mapping = aes())
{
    grid <- expand.grid(x = 1:ncol(data), y = 1:ncol(data))
    grid <- subset(grid, x != y)
    all <- do.call("rbind", lapply(1:nrow(grid), function(i) {
        xcol <- grid[i, "x"]
        ycol <- grid[i, "y"]
        data.frame(xvar = names(data)[ycol], yvar = names(data)[xcol], 
            x = data[, xcol], y = data[, ycol], data)
    }))
    all$xvar <- factor(all$xvar, levels = names(data))
    all$yvar <- factor(all$yvar, levels = names(data))
    densities <- do.call("rbind", lapply(1:ncol(data), function(i) {
        data.frame(xvar = names(data)[i], yvar = names(data)[i], 
            x = data[, i])
    }))
    densities$xvar <- factor(densities$xvar, levels = names(data))
    densities$yvar <- factor(densities$yvar, levels = names(data))
    mapping <- defaults(mapping, aes_string(x = "x", y = "y"))
    class(mapping) <- "uneval"
    ggplot(all) + facet_grid(xvar ~ yvar, scales = "free") + 
        geom_point(mapping, na.rm = TRUE) + stat_density(aes(x = x, 
        y = ..scaled.. * diff(range(x)) + min(x)), data = densities, 
        position = "identity", colour = "grey20", geom = "line")
}


plotmatrix2(mtcars[,1:3],aes(colour = factor(cyl)))

It may be a ggplot2 version issue, but I had to force the faceting variables in the densities data frame to be factors (that seems broken to me even in the GGally version). Also, generally don't pass vectors to aes(), but simply column names.

这篇关于用k-均值聚类着色ggplot的plotmatrix?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆