data.table替代管道 [英] data.table alternative to piping

查看:38
本文介绍了data.table替代管道的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在学习非常健壮和高效的data.table框架(包).但是我似乎无法弄清楚如何做这样的事情.我要做的是按多列(制造商和运营商)分组,基于该分组获得航班数量,然后按降序排列,然后是排名前10位的制造商和运营商的图表.我会在tidyverse中这样做,如下所示:

 库(nycflights13)图书馆(tidyverse)排期%&%;%left_join(planes,by ="tailnum")%&%group_by(制造商,运营商)%&%summarise(N = n())%>%排列(desc(N))%>%top_n(10,N)%>%ggplot(aes(载体,N,填充=制造商))+ geom_col()+指南(填充=假) 

这是我尝试过的方法:(我把问题留了几分钟尝试解决,但失败了

 库(data.table)fly< -copy(nycflights13 :: flights)setDT(fly)setkey(fly,tailnum)planes1<-复制(飞机)setDT(planes1)setkey(planes1,tailnum)#head(planes1,2)合并<-merge(fly,planes1,by ="tailnum")#按制造商分组合并[,.N,by =.(制造商,承运人)]#[,订单(制造商,承运人)] 

问题是我无法返回有序数据,也不知道如何在不首先将有序合并保存为对象的情况下链接"到ggplot.

解决方案

您可以使用方括号 [&] 将内容链接到

I'm currently learning the very robust and efficient data.table framework(package). I however can't seem to figure out how to do something like this. What I'm looking to do is group by multiple columns(manufacturer and carier), get the number of flights based on this grouping then arrange these in descending order followed by a ggplot of the top 10 manufacturers and carriers. I would do this in the tidyverse as follows:

library(nycflights13)
library(tidyverse)
flights %>% 
  left_join(planes, by = "tailnum") %>% 
  group_by(manufacturer, carrier) %>% 
  summarise(N = n()) %>% 
  arrange(desc(N)) %>% 
  top_n(10, N) %>% 
  ggplot(aes(carrier, N, fill = manufacturer)) + geom_col() + guides(fill = FALSE)

Here is what I've tried:(I left the question for several minutes to try and solve it but failed)

library(data.table)
fly<-copy(nycflights13::flights)
setDT(fly)
setkey(fly,tailnum)
planes1 <- copy(planes)
setDT(planes1)
setkey(planes1, tailnum)
#head(planes1,2)
Merged <- merge(fly, planes1, by = "tailnum")
#Group by manufacturer
Merged[, .N, by = .(manufacturer,carrier)] #[, order(manufacturer, carrier)]

The problem is I can't get to return the ordered data and also don't know how to "chain" to ggplot without saving the ordered merge as an object first.

解决方案

You can use the square brackets [ & ] to chain stuff together in . Furthermore, you can execute a ggplot call inside the j part of the syntax:

nms <- setdiff(names(planes1), "tailnum")

fly[planes1, on = .(tailnum), (nms) := mget(nms)
    ][, .N, by = .(manufacturer,carrier)
      ][order(-N)
        ][, .SD[1:10], by = .(manufacturer,carrier)
          ][, ggplot(.SD, aes(carrier, N, fill = manufacturer)) +
              geom_col() +
              guides(fill = FALSE)]

which gives:

这篇关于data.table替代管道的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆