在R中使用data.tables的凸包ggplot [英] Convex hull ggplot using data.tables in R

查看:315
本文介绍了在R中使用data.tables的凸包ggplot的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发现了一个很好的例子,使用ggplot和ddply在这里绘制凸包形状:
使用ggplot绘制多个geom_point组周围的轮廓



我想我会尝试类似的东西 - Ashby Diagram - 使用data.table包进行练习:

  test <-function()
{
library(data.table)
library(ggplot2)

set.seed(1)

这里我定义一个简单的表:

  dt <-data.table(xdata = runif (15),ydata = runif(15),level = rep(c(a,b,c),each = 5),key =level)

然后我按级别定义船体位置:

  hulls <-dt [,as.integer(chull(.SD)),by = level] 
setnames(hulls,V1,hcol)

所以我的想法是合并外壳与dt,所以我可以最终操纵外壳以获得适当的形式ggplot

  ashby <-ggplot(dt,aes(x = xdata,y = ydata,color = level))+ 
geom_point()+
geom_line()+
geom_polygon(data = hulls,aes(fill = level))
}

但是似乎任何方式,我试图合并外壳和dt,我得到一个错误。例如,merge(hulls,dt)会产生如 footnote 1 中所示的错误。



这似乎应该很简单, 确定我只是错过了一些明显的东西。任何方向一个类似的职位或想法如何为ggplot预备船体非常感谢。



不想要的输出示例

  test <-function(){
library(data.table)
library(ggplot2)
dt< -data.table(xdata = runif(15),ydata = runif(15),level = rep(c(a,b,c),each = 5),key =level b $ b set.seed(1)
hulls <-dt [,as.integer(chull(.SD)),by = level]
setnames(hulls,V1,hcol)
setkey(dt,'level')#设置键似乎不必要
setkey(hulls,'level')
hulls <-hulls [dt,allow.cartesian = TRUE]
ggplot(dt,aes(x = xdata,y = ydata,color = level))+
geom_point()+
geom_polygon(data = hulls,aes(fill = level))
}

会导致混乱的多边形:



脚注1:
错误在vecseq(f__,len__,if(allow.cartesian)NULL else as.integer(max(nrow(x),:
连接结果60行;大于15 = max(nrow(x),nrow(i))。检查i中的重复键值,每次重复连接到x中的同一个组。如果没关系,尝试包括 j ,并通过(by-without-by)删除以避免大量分配。如果您确定要继续,请使用allow.cartesian = TRUE重新运行。否则,请在FAQ,Wiki,Stack Overflow和datatable-help中搜索此错误消息以获得建议。




解决方案

这里是你想做的。生成一些随机数据:

 库(ggplot2)
库(data.table)
设置种子_before_生成随机数据,不是在
之后set.seed(1)
dt < - data.table(xdata = runif(15),ydata = runif(15) rep(c(a,b,c),each = 5),
key =level)

这里是魔法发生的地方:

  hulls < SD [chull(xdata,ydata)],by = level] 

绘制结果:

  ggplot(dt,aes(x = xdata,y = ydata,color = level))+ 
geom_point b $ b geom_polygon(data = hulls,aes(fill = level,alpha = 0.5))





>



它的工作原理是因为 chull 返回一个索引向量,需要从数据中选择形成一个凸包。然后我们用 .SD [...] data.table 将每个单独的数据框连接在一起 level


I found a nice example of plotting convex hull shapes using ggplot with ddply here: Drawing outlines around multiple geom_point groups with ggplot

I thought I'd try something similar--create something like an Ashby Diagram--to practice with the data.table package:

test<-function()
{
library(data.table)
library(ggplot2)

set.seed(1)

Here I define a simple table:

dt<-data.table(xdata=runif(15),ydata=runif(15),level=rep(c("a","b","c"),each=5),key="level")

And then I define the hull positions by level:

hulls<-dt[,as.integer(chull(.SD)),by=level]
setnames(hulls,"V1","hcol")

So then my thought was to merge hulls with dt, so that I could eventually manipulate hulls to get in the proper form for ggplot (shown below for reference):

ashby<-ggplot(dt,aes(x=xdata,y=ydata,color=level))+
        geom_point()+
        geom_line()+
        geom_polygon(data=hulls,aes(fill=level))
}

But it seems that any way I try to merge hulls and dt, I get an error. For example, merge(hulls,dt) produces the error as shown in footnote 1.

This seems like it should be simple, and I'm sure I'm just missing something obvious. Any direction to a similar post or thoughts on how to prep hull for ggplot is greatly appreciated. Or if you think that it's best to stick with the ddply approach, please let me know.

Example undesired output:

test<-function(){
    library(data.table)
    library(ggplot2)
    dt<-data.table(xdata=runif(15),ydata=runif(15),level=rep(c("a","b","c"),each=5),key="level")
    set.seed(1)
    hulls<-dt[,as.integer(chull(.SD)),by=level]
    setnames(hulls,"V1","hcol")
    setkey(dt, 'level') #setting the key seems unneeded
    setkey(hulls, 'level')
    hulls<-hulls[dt, allow.cartesian = TRUE]
    ggplot(dt,aes(x=xdata,y=ydata,color=level))+
            geom_point()+
            geom_polygon(data=hulls,aes(fill=level))
}

results in a mess of criss-crossing polygons:

Footnote 1: Error in vecseq(f__, len__, if (allow.cartesian) NULL else as.integer(max(nrow(x), : Join results in 60 rows; more than 15 = max(nrow(x),nrow(i)). Check for duplicate key values in i, each of which join to the same group in x over and over again. If that's ok, try including j and dropping by (by-without-by) so that j runs for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and datatable-help for advice.

解决方案

Here is what you want to do. Generating some random data:

library(ggplot2)
library(data.table)
# You have to set the seed _before_ you generate random data, not after
set.seed(1) 
dt <- data.table(xdata=runif(15), ydata=runif(15), level=rep(c("a","b","c"), each=5),
  key="level")

Here is where the magic happens:

hulls <- dt[, .SD[chull(xdata, ydata)], by = level]

Plotting the result:

ggplot(dt,aes(x=xdata,y=ydata,color=level)) +
    geom_point() +
    geom_polygon(data = hulls,aes(fill=level,alpha = 0.5))

produces

It works because chull returns a vector of indexes that need to be selected from the data to form a convex hull. We then subset each individual data frame with .SD[...], and data.table joins them together by level.

这篇关于在R中使用data.tables的凸包ggplot的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆