更快地等于group_by%&%;%在R中扩展 [英] Faster equivalent to group_by %>% expand in R

查看:64
本文介绍了更快地等于group_by%&%;%在R中扩展的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试为R中的多个ID创建年份序列。我的输入表的每个ID都有一行,并提供了一个Start_year。看起来像这样:

I am trying to create a sequence of years for multiple IDs in R. My input table has a single row for each ID, and gives a Start_year. It looks like this:

ID    Start_year
01          1999
02          2004
03          2015
04          2007

等...

我需要为每个ID创建一个包含多行的表,以显示从其Start_year到2015年的每一年。然后,我将使用它连接到另一个表。因此,在我的示例中,ID1在1999:2015年将有17行。 ID2将有12行2004:2015,ID3将有1行2015,ID4将有9行2007:2015。

I need to create a table with multiple rows for each ID, showing each year from their Start_year up to 2015. I will then use this to join to another table. So in my example, ID1 would have 17 rows with the years 1999:2015. ID2 would have 12 rows 2004:2015, ID3 would have 1 row 2015, and ID4 would have 9 rows 2007:2015.

对于我的部分数据,我可以获得这可以使用以下代码来工作:

For a subset of my data I can get this to work using the following code:

df %>% group_by(ID) %>% expand(year = Start_year:2015, Start_year) %>% select(-Start_year)

但是,我的完整数据集大约有500万个ID,此命令似乎非常慢,需要花费很多时间。

However, my full dataset has about 5 million IDs, and this command seems to be extremely slow, taking many hours.

因此,我正在R中寻找该命令的更快实现。 ,data.table命令通常似乎比dplyr / tidyr快-但是,我对data.table语法不甚了解。

I'm therefore looking for a faster implementation of this command in R. In my experience, data.table commands often seem to be faster than dplyr/tidyr - however, I am quite unfamiliar with data.table syntax.

推荐答案

您可以

out <- DT[, .(col = seq.int(Start_year, 2015L)), by = ID]
out
#    ID  col
# 1:  1 1999
# 2:  1 2000
# 3:  1 2001
# 4:  1 2002
# 5:  1 2003
# 6:  1 2004
# 7:  1 2005
# 8:  1 2006
# 9:  1 2007
# ...

您可能需要这样做

setDT(df)[, .(col = seq.int(Start_year, 2015L)), by = ID]






A tidyverse 相同想法的方式

library(readr); library(dplyr); library(tidyr)
tbl <- read_table(text)

tbl %>% 
  group_by(ID) %>% 
  mutate(Start_year = list(seq.int(Start_year, 2015L))) %>%
  # rename(new_col = Start_year)
  unnest()

数据

text <- "ID    Start_year
01          1999
02          2004
03          2015
04          2007"

library(data.table)
DT <- fread(text)

这篇关于更快地等于group_by%&%;%在R中扩展的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆