在 R 中使用 data.tables 的凸包 ggplot [英] Convex hull ggplot using data.tables in R

查看:20
本文介绍了在 R 中使用 data.tables 的凸包 ggplot的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在这里找到了一个使用 ggplot 和 ddply 绘制凸包形状的好例子:使用ggplot围绕多个geom_point组绘制轮廓>

我想我会尝试类似的东西——创建类似 Ashby Diagram 的东西——来练习 data.table 包:

test<-function(){图书馆(数据表)图书馆(ggplot2)set.seed(1)

这里我定义了一个简单的表:

dt<-data.table(xdata=runif(15),ydata=runif(15),level=rep(c("a","b","c"),each=5),key="级别")

然后我按级别定义船体位置:

hulls<-dt[,as.integer(chull(.SD)),by=level]setnames(船体,V1",hcol")

所以我的想法是将 hulls 与 dt 合并,以便我最终可以操纵 hulls 以获得 ggplot 的正确形式(如下所示以供参考):

ashby<-ggplot(dt,aes(x=xdata,y=ydata,color=level))+geom_point()+geom_line()+geom_polygon(data=hulls,aes(fill=level))}

但似乎以任何方式尝试合并 hulls 和 dt,我都会收到错误消息.例如,merge(hulls,dt) 会产生如脚注 1 所示的错误.

这看起来应该很简单,而且我确定我只是遗漏了一些明显的东西.非常感谢对类似帖子的任何指导或关于如何为 ggplot 准备船体的想法.或者,如果您认为最好坚持使用 ddply 方法,请告诉我.

不想要的输出示例:

test<-function(){图书馆(数据表)图书馆(ggplot2)dt<-data.table(xdata=runif(15),ydata=runif(15),level=rep(c("a","b","c"),each=5),key="level")set.seed(1)船体<-dt[,as.integer(chull(.SD)),by=level]setnames(船体,V1",hcol")setkey(dt, 'level') #设置密钥似乎不需要setkey(船体,'级别')船体<-船体[dt,allow.cartesian = TRUE]ggplot(dt,aes(x=xdata,y=ydata,color=level))+geom_point()+geom_polygon(data=hulls,aes(fill=level))}

导致一团糟的纵横交错的多边形:

脚注 1:

<块引用>

vecseq(f__, len__, if (allow.cartesian) NULL else 中的错误as.integer(max(nrow(x), : Join 结果为 60 行;超过 15 =最大(nrow(x),nrow(i)).检查 i 中的重复键值,每个一次又一次地加入 x 中的同一个组.如果没问题,尝试包含 j 并删除 by(by-without-by),以便 j 运行每组避免大分配.如果您确定要继续,使用allow.cartesian=TRUE重新运行.否则,请搜索FAQ、Wiki、Stack Overflow 和 datatable-help 中的此错误消息寻求建议.

解决方案

这是您想要执行的操作.生成一些随机数据:

库(ggplot2)图书馆(数据表)# 你必须_before_设置种子你生成随机数据,而不是之后set.seed(1)dt <- data.table(xdata=runif(15), ydata=runif(15), level=rep(c("a","b","c"), each=5),键=级别")

这就是魔法发生的地方:

hulls <- dt[, .SD[chull(xdata, ydata)], by = level]

绘制结果:

ggplot(dt,aes(x=xdata,y=ydata,color=level)) +geom_point() +geom_polygon(数据=船体,aes(填充=水平,alpha = 0.5))

生产

之所以有效,是因为 chull 返回一个索引向量,需要从数据中选择这些索引以形成凸包.然后我们使用 .SD[...] 对每个单独的数据帧进行子集化,并且 data.table 通过 level 将它们连接在一起.

I found a nice example of plotting convex hull shapes using ggplot with ddply here: Drawing outlines around multiple geom_point groups with ggplot

I thought I'd try something similar--create something like an Ashby Diagram--to practice with the data.table package:

test<-function()
{
library(data.table)
library(ggplot2)

set.seed(1)

Here I define a simple table:

dt<-data.table(xdata=runif(15),ydata=runif(15),level=rep(c("a","b","c"),each=5),key="level")

And then I define the hull positions by level:

hulls<-dt[,as.integer(chull(.SD)),by=level]
setnames(hulls,"V1","hcol")

So then my thought was to merge hulls with dt, so that I could eventually manipulate hulls to get in the proper form for ggplot (shown below for reference):

ashby<-ggplot(dt,aes(x=xdata,y=ydata,color=level))+
        geom_point()+
        geom_line()+
        geom_polygon(data=hulls,aes(fill=level))
}

But it seems that any way I try to merge hulls and dt, I get an error. For example, merge(hulls,dt) produces the error as shown in footnote 1.

This seems like it should be simple, and I'm sure I'm just missing something obvious. Any direction to a similar post or thoughts on how to prep hull for ggplot is greatly appreciated. Or if you think that it's best to stick with the ddply approach, please let me know.

Example undesired output:

test<-function(){
    library(data.table)
    library(ggplot2)
    dt<-data.table(xdata=runif(15),ydata=runif(15),level=rep(c("a","b","c"),each=5),key="level")
    set.seed(1)
    hulls<-dt[,as.integer(chull(.SD)),by=level]
    setnames(hulls,"V1","hcol")
    setkey(dt, 'level') #setting the key seems unneeded
    setkey(hulls, 'level')
    hulls<-hulls[dt, allow.cartesian = TRUE]
    ggplot(dt,aes(x=xdata,y=ydata,color=level))+
            geom_point()+
            geom_polygon(data=hulls,aes(fill=level))
}

results in a mess of criss-crossing polygons:

Footnote 1:

Error in vecseq(f__, len__, if (allow.cartesian) NULL else as.integer(max(nrow(x), : Join results in 60 rows; more than 15 = max(nrow(x),nrow(i)). Check for duplicate key values in i, each of which join to the same group in x over and over again. If that's ok, try including j and dropping by (by-without-by) so that j runs for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and datatable-help for advice.

解决方案

Here is what you want to do. Generating some random data:

library(ggplot2)
library(data.table)
# You have to set the seed _before_ you generate random data, not after
set.seed(1) 
dt <- data.table(xdata=runif(15), ydata=runif(15), level=rep(c("a","b","c"), each=5),
  key="level")

Here is where the magic happens:

hulls <- dt[, .SD[chull(xdata, ydata)], by = level]

Plotting the result:

ggplot(dt,aes(x=xdata,y=ydata,color=level)) +
    geom_point() +
    geom_polygon(data = hulls,aes(fill=level,alpha = 0.5))

produces

It works because chull returns a vector of indexes that need to be selected from the data to form a convex hull. We then subset each individual data frame with .SD[...], and data.table joins them together by level.

这篇关于在 R 中使用 data.tables 的凸包 ggplot的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆